Google sheets binning and group by (custom time interval) - google-sheets

I wish to count events that occurred within a custom time interval : it could be within 24h, or within a week or 2-months span.
I am using google sheets: I can create a pivot table and group by month, however I'd like to explore insights using custom intervals (I'm looking for pattern in epilepsy).
As final result, I wanna have a table that, for each day, it is reported the number of frequencies within that interval.
Particularly, I wanna focus on the interval of 24h to count the number of events of epilepsy (known as cluster seizures).
And then, on custom days intervals to explore periodicity or trends - like each 48 hours, or each 15 or 30 days.
See a mockup of Google Sheet here:
https://docs.google.com/spreadsheets/d/1tCxYV5mUcq6vKm8-fL-0HUAOjcB9fipLCqPD2Znv-X0/edit#gid=1372548551
I tried this attempts:
find out how many events occurred in the last 30 days prior to the reported date:
= IFERROR(
QUERY(
A:E,
"SELECT COUNT(A)
WHERE
A IS NOT NULL AND
E = FALSE AND
A >= date '" &
TEXT(
A2-30,
"yyyy-MM-dd"
) &"' AND
A <= date '" &
TEXT(
A2,
"yyyy-MM-dd"
) &"'
LABEL COUNT(A) '' "), "N/A")
Then, dragging the cell, I get the column "# events in the prior 30 days".
It works but seems a bit messy - especially for updating the intervals.
I tried this other approach:
=query(B:E, "select B, count(E), -1+count(E) where E = FALSE group by B label B 'Date with Clusters', count(E) 'Cluster seizures '")
That produces the last table.
I like this approach better, but here I am just grouping by the same date, without possibility to have a custom interval.
As an example, I will have that two events will be counted within the same day, not withing the same 24h interval.
Could you tell a better approach to handle datetime differences, so to create binning and group by with custom intervals ?
Below an example:
on the left table, data in input; on the middle column, result of first approach; on the right table, results of second approach.

given the table:
in order to group stuff with QUERY we need to "fix" the A column in order to get a custom period. lets say we need to group events every 3 weeks (21 days). we take the lowest and highest date and create a sequence with all the dates in between.
=INDEX(ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A))))
then we use running total on it to get every date which is 21 days apart from the previous/next one. we could use simple SEQUENCE (for min>max) to create this array but with SEQUENCE we cant go "back in time" (for max>min) so we use MMULT and negative number
therefore, to start from a frame of the first date and create 3 weeks group by windows (eg. min>max) we use:
=ARRAYFORMULA({MIN(A2:A); MIN(A2:A)+MMULT(TRANSPOSE((
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*21); SiGN(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))})
and to get a reverse of it and start from frame of end date and create 3 weeks windows backwards (eg. max>min) we use:
=ARRAYFORMULA({MAX(A2:A); MAX(A2:A)+MMULT(TRANSPOSE((
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*-21); SiGN(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))})
at this stage, we can start fixing the A column via VLOOKUP and 4th argument set to 1 - approximate mode (instead of 0 - exact match mode) so forward in time will be:
=ARRAYFORMULA(IFNA(VLOOKUP(A2:A; SORT({MIN(A2:A); MIN(A2:A)+MMULT(TRANSPOSE((
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*21); SIGN(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))}); 1; 1)))
and backward in time shall be:
=ARRAYFORMULA(IFNA(VLOOKUP(A2:A; SORT({MAX(A2:A); MAX(A2:A)+MMULT(TRANSPOSE((
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*-21); SIGN(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))}); 1; 1)))
and now we just create a virtual array {} and pair fixed column A with column C and input it as range into QUERY
side note:
to put columns next to each other in english spreadsheets we use ,
to put columns next to each other in non-english spreadsheets we use \
=ARRAYFORMULA(QUERY({IFNA(VLOOKUP(A2:A; SORT({MIN(A2:A); MIN(A2:A)+MMULT(TRANSPOSE((
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*21); SIGN(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))}); 1; 1))\ C2:C};
"select Col1,count(Col1)
where Col2 = FALSE
group by Col1
order by count(Col1) desc
label count(Col1)''"))
and backwards in time:
=ARRAYFORMULA(QUERY({IFNA(VLOOKUP(A2:A; SORT({MAX(A2:A); MAX(A2:A)+MMULT(TRANSPOSE((
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*-21); SIGN(
ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))}); 1; 1))\ C2:C};
"select Col1,count(Col1)
where Col2 = FALSE
group by Col1
order by count(Col1) desc
label count(Col1)''"))
demo spreadsheet

Related

Comparing Dates in Google Sheets with different format

I tried everything and can't make this work.
File 1 has a date and transactions.
File 2 has date and other exported date from a software, so Column A is a date that is not formatted.
Basically I want to get the number of transactions per day on file 1, when in file 2 we have Column B with "google / cpc" and Column C contains "search".
The problem here is that I can't make the dates from File 1 to File 2 to compare to give me the transactions. It never compares.
File 1 https://docs.google.com/spreadsheets/d/1Xvoo2Rob3kI4duPpmCTfhMLPlvdzIJY9ZQBV7CHccoc/edit?usp=sharing
File 2 https://docs.google.com/spreadsheets/d/10Enq805we6_XcTkytwfj6ON1ZnITnAGUwLVoaGzXeco/edit?usp=sharing
I tried to make the date from file 2 like the date from file 1 using concatenate and LEFT and RIGHT formulas, but they look similar to the eye, but google sheets can't compare.
I tried to also change the format to date and play with it, but still can't get them to compare the dates.
you can try this out:
=MAKEARRAY(31,1,LAMBDA(r,c,INDEX(LAMBDA(z,SUM(IFNA(FILTER(INDEX(z,,6),INDEX(DATE(LEFT(INDEX(z,,1),4),MID(INDEX(z,,1),5,2),RIGHT(INDEX(z,,1),2)))=INDEX(A3:A33,r),INDEX(z,,2)="google / cpc",REGEXMATCH(INDEX(z,,3),"(?i)search")))))(importrange("10Enq805we6_XcTkytwfj6ON1ZnITnAGUwLVoaGzXeco","Sheet2!A:F")))))
they look similar to the eye, but google sheets can't compare
The values like 20230102 in spreadsheet 2 column A look like dates but are actually numbers in the neighborhood of 20 million, such as 20,230,102.
It is unclear whether your intention is to use that data in just this one report or several such reports. If the latter, you may want to Insert > Sheet in spreadsheet 1 and put this formula in cell A1 of that new 'Import' sheet to import and convert the data:
=arrayformula(
lambda(
ssId, datelikeRangeA1, criteriaRangeA1, transactionsRangeA1,
lambda(
dates, criteria, transactions,
query(
{ dates, criteria, transactions },
"select Col1, sum(Col3) where Col2 = 'google / cpc' group by Col1
label Col1 'Date', sum(Col3) 'Total transactions' ",
0
)
)(
to_date( value( regexreplace(
to_text( importrange(ssId, datelikeRangeA1) ),
"(\d{4})(\d{2})(\d{2})", "$1-$2-$3"
) ) ),
importrange(ssId, criteriaRangeA1),
importrange(ssId, transactionsRangeA1)
)
)(
"10Enq805we6_XcTkytwfj6ON1ZnITnAGUwLVoaGzXeco",
"Sheet2!A2:A",
"Sheet2!B2:B",
"Sheet2!F2:F"
)
)
The formula may look a bit complex, but it is easy to adjust to other similar imports you may need by modifying the parameters at the end. You can then more easily to refer to the data in your various reports. To match the dates in Sheet1!A3:A33, put this formula in cell Sheet1!B3:
=arrayformula(
ifna(
vlookup(
A3:A33,
Import!A2:B,
columns(Import!A2:B),
false
)
)
)
Using your current formula,try changing your values with text and back to number with value:
=INDEX(IFNA(VLOOKUP(VALUE(TEXT(A3:A33,"yyyymmdd")), QUERY(IMPORTRANGE("10Enq805we6_XcTkytwfj6ON1ZnITnAGUwLVoaGzXeco", "Sheet2!A:N"),
"select Col1,sum(Col6) where Col2 = 'google / cpc' and Col3 = 'search' group by Col1"), 2, 0)))

Arrayformula + Query in Google Sheets

I have one workbook that has daily tracking of bidding dates and amounts and win day and amounts. In another workbook, I access that information to see the totals per day. Then play with that to see week/month/quarter and year performance. The problem I have is that I can not get my query searches to work in an array manner and have to drag the date driven formulas down 364 times for each column of information and all it's done is create a slow database...
=QUERY(importrange('source addresses'!$B$2,"bid tracker!$a$3:$x"),
"SELECT SUM (Col9)
WHERE Col8 = date '"&text(B5,"yyyy-mm-dd")&"'
label sum(Col9) '' ",0)
Col9 = amount of contract
Col8 = date of win
This formula produces a single data output for the sum of that days wins, I need it to array all 365 days to eliminate the other 364 of THESE formulas I am using due to my inexperience.
I have used arrayformula before query and after and every example I see doesn't seem to apply or work for me...
You need group by clause like group by Col8. Try-
=QUERY(importrange('source addresses'!$B$2,"bid tracker!$a$3:$x"),
"SELECT SUM (Col9)
WHERE Col8 <= date '"&text(B5,"yyyy-mm-dd")&"' AND Col8 >= date '"&text(B5-369,"yyyy-mm-dd")&
"' group by Col8 label sum(Col9) '' ",0)

Query/Pivot Function: Header Title Descending Order & Month Format mmmm yyyy

[Goal]
I want to create a table with the Query Function where it counts the number of 'Drivers' for each month in a dynamic manner. Meaning that when the data (example sheet is called 'Data') is updated, it'll be updated automatically as well.
[What I was able to do so far]
I was able to create a table with the Query Function, however, it only displays 1 column worth of Months when I want to show up to 4 months. And I also want to show the recent months from the left and the older months on the right.
[Formula that I have so far]
=QUERY(Data!$A:$B,"
SELECT B,
Count(B)
Where B != '' AND MONTH(A)=MONTH(DATE'"&TEXT(A2,"YYYY-MM-DD")&"')
Group By B
Pivot A
Order By B asc
Label B 'Drivers', Count(B) '"&TEXT(A2,"MMMM YYYY")&"'",1)
[Issue that I'm facing]
I've tried specifying the date range like the below, however there are 2 problems.
The date format is not mmmm yyyy (Example: May 2022) and it'd show as: 2022-2-1 May 2022
The months are ordered in an ascending manner (Example: 2022-2-1, 2022-3-1, 2022-4-1) instead of descending (Example: 2022-4-1, 2022-3-1, 2022-2-1)
So I'm not sure what I need to do to fix this. Hopefully I can have support.
Where B != '' AND MONTH(A)<=MONTH(DATE'"&TEXT(A2,"YYYY-MM-DD")&"')
AND MONTH(A)>=MONTH(DATE'"&TEXT(EDATE(A2,-3),"YYYY-MM-DD")&"')
[Sample Sheet]
https://docs.google.com/spreadsheets/d/1AJYTRga9-dXbj64nl4RfpDKs5JYSFwP2SiR7v7gAhMI/edit#gid=1297239620
=ARRAYFORMULA(REGEXREPLACE(""&TRANSPOSE(QUERY(TRANSPOSE(
QUERY({Data!A:B, TEXT(Data!A:A, "yyyymmdd×MMMM yyyy")},
"select Col2,count(Col2)
where Col2 != ''
AND MONTH(Col1)<=MONTH(DATE'"&TEXT(A2,"YYYY-MM-DD")&"')
AND MONTH(Col1)>=MONTH(DATE'"&TEXT(EDATE(A2,-3),"YYYY-MM-DD")&"')
group by Col2
pivot Col3", 1)),"order by Col1 desc")),"^(.*×)", ))
Reference:
How do I change the date format in a Google Sheets query pivot table with date filters?
Sort Query Pivot Table - Google Sheets

Google sheets Query function with Arrayformula

For each of the email id, I want to get latest 10 records by timestamp. How do I get the results with arrayformula? Query function is not important as long as I can still achieve this with arrayformula. Here is the sample data:
https://docs.google.com/spreadsheets/d/1YAHA02VM-5MXzVKhkxu_eODPKObpoz441mGX8lOFu5M/edit?usp=sharing
Try this on another sheet, row 1:
=arrayformula(query({query({Sheet1!$A:$C},"order by Col1 desc,Col2",1),{"Dupe position";countifs(query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0),query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0),row(Sheet1!$A2:$C),"<="&row(Sheet1!$A2:$C))}},"select Col1,Col2,Col3 where Col1 is not null and Col4 <= 10 order by Col1",1))
You can adjust the number of records found by adjusting Col4 <= 10, and also the final sort by altering order by Col1 at the end of the formula.
Explanation
This gets the data from Sheet1, sorts it by date desc then email asc:
query({Sheet1!$A:$C},"order by Col1 desc,Col2",1)
Then to the side of this data, a COUNTIFS() is used to get the number each time an email appears in the list above (since it's sorted desc, 1 represents the most recent instance).
countifs(<EmailColumnData>,<EmailColumnData>,row(<EmailColumn>),"<="&row(<EmailColumn>))
In place of <EmailColumnData> in the COUNTIF() is:
query({Sheet1!$A2:$C},"select Col2 order by Col1 desc,Col2",0)
In place of <EmailColumn> above, we only want the row number so we don't need the actual data. We can use:
Sheet1!$A2:$C
Various {} work as arrays to bring the data together.
Eg., {a,b,c;d,e,f} would result in three columns, with a, b, c in row 1 and d, e, f in row 2. , is a new column, ; is a return for a new row.
A final query around everything gets the 3 columns we need, where the count number in col 4 is <=10, then sorts the output by Col1 (date asc).
On second thoughts, maybe this is bit cheeky, but this might do it ( taken from conditional rank idea )
=ArrayFormula(filter(A2:C,countifs(A2:A,">="&A2:A,B2:B,B2:B)<=10,A2:A<>""))
EDIT
The above assumes (because the data is time-stamped) dups shouldn't occur. If they do and the data is pre-sorted, you can use row number as a proxy for time stamp as suggested by #Aresvik.
Alternatively, you could count separately
(a) only rows with a later timestamp
plus
(b) rows with the same time stamp but with earlier (or identical) row number
=ArrayFormula(filter(A2:C,countifs(A2:A,">"&A2:A,B2:B,B2:B)+countifs(A2:A,"="&A2:A,B2:B,B2:B,row(A2:A),"<="&row(A2:A))<=10,A2:A<>""))
I have added a new sheet ("Erik Help") with the following formula in A1:
=ArrayFormula({"Submitted Time","Email","Score";SORT(SPLIT(FLATTEN(QUERY(SORT(TRANSPOSE(SPLIT(TRANSPOSE(QUERY(IF(Sheet1!B2:B=TRANSPOSE(UNIQUE(FILTER(Sheet1!B2:B,Sheet1!B2:B<>""))),Sheet1!A2:A&"|"&Sheet1!B2:B&"|"&Sheet1!C2:C,),,COUNTA(Sheet1!A2:A)))," ",0,1)),SEQUENCE(MAX(COUNTIF(Sheet1!B2:B,Sheet1!B2:B))),0),"LIMIT 10")),"|",1,0),1,0)})
The number of records is set after LIMIT.
The order is set by the final two numbers: 1,0 (meaning "sort by column 1 in reverse order," which, as currently set, is sorting in reverse order by date/time).

How can I adjust this query to add conditionals from other columns? Formula / sample sheet included

https://docs.google.com/spreadsheets/d/1TjkR3TEg_eSei-25zUm8yRimftQ6ocRKQNEfrN-9Ogc/edit?usp=sharing
^ Sample sheet with my current formula, sample data, and description of the problem/current situation.
The current formula calculates the average of the last 10 appearances (going from the bottom of the sheet upwards) of columns C or D when "New York" (cell K1) is in columns B or C.
If New York appears in column B then it uses the value in column D, and if New York appears in column C it uses the value in column E.
The improvement I want to make is that it only uses the values (within those last 10 appearances of "New York" / cell K1) based on conditionals of columns G/F. In this case, let's say >10 as the conditional.
When "New York" is in columns B/C, for the last 10 appearances, it should bring the value in D into the equation if the value in F is >10 (and New York is in column B), and it should bring E into the equation if the value in G >10 (and New York is in column C).
Any ideas?
range construct:
={A:A, B:B, D:D, G:G;
A:A, C:C, E:E, F:F}
or shorter:
={A:B, D:D, G:G;
A:A, C:C, E:F}
use:
=AVERAGE(QUERY(SORT({A:B, D:D, G:G; A:A, C:C, E:F}, 1, ),
"select Col3
where Col4 > 10
and Col2 = '"&K1&"'
limit 10"))
I won't calculate the average, just so you can see the data records the query is pulling, and confirm the records. But I think my formula works.
=query(
query(
{query(A1:G,"select A,B,D,G where B='"&K1&"' ",0);
query(A1:G,"select A,C,E,F where C='"&K1&"' ",0)},
"select * order by Col1 desc limit 10",0),
"select * where Col4 > 10",0)
To get the average, change the last line of the formula to:
"select avg(Col3) where Col4 > 10",0)
Note: my understanding is that you want to filter the ten latest records with New York, and then filter those ten records to just those which have a value > 10 in the right column. This is different then the ten latest records that are New York AND have a value > 10 in the right column. But either solution can be provided.
I've stacked two queries together, to make the correct columns align vertically. So the first inner query gets column A,B,D and G, checking for New York (ie equal to K1) in B. Then the second query stacks columns A,C,E, and F underneath, checking for New York in C.
An outer query then sorts them in descending order by the date column, Col1 (column A). By setting a limit of ten, we get the ten latest records.
A final query is used to select the records with Col4>10. By changing this query to just return the avg(Col3), you should have your desired result.
It should be easy to modify this formula to get what you need.
Note also I believe that you missed a couple of records to be blue - G21 and F28? And E21 should be green also?
Update
When using the final version of the formula, to extract the Average, you can add the LABEL parameter to the QUERY statement to rename, or remove, the header label for that average. So in my example, the SELECT statement would become:
"select avg(Col3) where Col4 > 10 label avg(Col3) '' ",0)
or
"select avg(Col3) where Col4 > 10 label avg(Col3) 'New Label Name Here' ",0)
Update #2
I have provided a sample sheet, which has the enhancements you requested. The formula that calculates the result, the average, is in J3. The formula looks to a variable cell, I3, for the city name. I3 uses data validation, from a list in K2:L, to present the drop down list of city names to pick from.
The selection criteria are located in J6 and J7. If you had standard values you wanted to pick from here, maybe between 10 and 20, they could also be presented with a drop down list. But otherwise, just type in the desired limit values.
As an enhancement, I used conditional formatting to color the active cells in the data. Note that all matching rows will get colored, not just the latest ten. But the formula calculating the average should just be suing the ten latest, THEN applying the criteria, before calculating the average. Test this carefully to be sure it is doing what you expect.
Note that the correct placement of the single and double quotes is very important when referencing criteria cells with the SELECT ... WHERE ... statements. Comparison to text values requires single quotes, whereas comparison to numeric values excludes the single quotes.
Valid QUERY Select statements for a numeric comparison:
"select * where A >= " & $B$5 & " limit 5 "
Valid QUERY Select statements for a text/string comparison:
"select * where A >= 'New York' limit 5 "
"select * where A >= '" & $B$5 & "' limit 5 "
<<== Do not have any spaces between the single and double quotes!
Invalid QUERY Select statements for a text/string comparison
"select * where A >= "New York" limit 5 "
<<== Do not have any spaces between the single and double quotes!
"select * where A >= ' " & $B$5 & " ' limit 5 "
<<== Valid, but matches " New York ", not "New York"!

Resources