Get multi column range but only where specific column is not repeated - google-sheets

So, I have a sheet named "Calendar" and another sheet called "Stats".
Here's a sample of the "Calendar" sheet:
F
G
H
I
J
K
2023-01-27
Fri
11:30 PM
Family
Family Activity 1
YYY
2023-01-27
Fri
11:45 PM
Family
Family Activity 1
YYY
2023-01-28
Sat
12:00 AM
Family
Family Activity 1
YYY
2023-01-28
Sat
12:15 AM
Family
Family Activity 1
XY
2023-01-28
Sat
12:30 AM
Fun
Fun Activity 1
ABC
2023-01-28
Sat
12:45 AM
Fun
Fun Activity 1
ABC
2023-01-28
Sat
1:00 AM
Obligations
Obligations 1
AAA
2023-01-28
Sat
1:15 AM
Fun
Fun Activity 2
ZZZ
2023-01-28
Sat
1:30 AM
Fun
Fun Activity 2
ZZZ
2023-01-28
Sat
1:45 AM
Family
Family Activity 2
MMM
2023-01-28
Sat
2:00 AM
Family
Family Activity 2
MMM
Now, on the "Stats" sheet there's a date in cell B16. For this example, it's 2023-01-28.
What I want is that I can get the columns H, I, J, and K from "Calendar" where F equals the date specified in cell B16 of the "Stats" sheet.
The tricky part, where I'm having issues, is to only show the rows where the previous row isn't identical, resp. where I, J, and K aren't the exact same as the previous row, like this:
H
I
J
K
12:00 AM
Family
Family Activity 1
YYY
12:15 AM
Family
Family Activity 1
XY
12:30 AM
Fun
Fun Activity 1
ABC
1:00 AM
Obligations
Obligations 1
AAA
1:15 AM
Fun
Fun Activity 2
ZZZ
1:45 AM
Family
Family Activity 2
MMM
I'm not sure if it's comprehensive, if it isn't please let me know so I can clarify.
What I got so far is the following formula:
=QUERY(A:K,"select H,I,J,K where F = date '2023-01-28'")
This only works if I execute it in the "Calendar" sheet and the date isn't dependent of cell B16 of the "Stats" sheet. However, ideally I'd like place the formula into the "Stats" sheet.

you can try:
=FILTER(Calendar!H2:K,Calendar!F2:F=B16,{"";LAMBDA(z,MAKEARRAY(COUNTA(z),1,LAMBDA(r,c,IF(INDEX(z,r)=INDEX(z,r-1),1,))))(INDEX(Calendar!F3:F&Calendar!I3:I&Calendar!J3:J&Calendar!K3:K))}<>1)

If you can, you may add an auxiliary column in your raw data sheet. I'll say it's L, with this formula in L2:
=MAP(F2:F,I2:I,J2:J,K2:K,LAMBDA(fx,ix,jx,kx,IF(OR(fx<>OFFSET(fx,-1,0),ix<>OFFSET(ix,-1,0),jx<>OFFSET(jx,-1,0),kx<>OFFSET(kx,-1,0)),1,0)))
It checks if F,I,J and K are equal, and returns 1 or 0. Then you can do a QUERY like this:
=QUERY(A:L,"select H,I,J,K WHERE L = 1 AND F = date '"&TEXT(B16,"YYYY-MM-DD")&"'")
If you can't add the column you may do it like this joining all this in one formula:
=QUERY({Calendar!F:K,"";MAP(Calendar!F2:F,Calendar!I2:I,Calendar!J2:J,Calendar!K2:K,LAMBDA(fx,ix,jx,kx,IF(OR(fx<>OFFSET(fx,-1,0),ix<>OFFSET(ix,-1,0),jx<>OFFSET(jx,-1,0),kx<>OFFSET(kx,-1,0)),1,0)))},"select Col3,Col4,Col5,Col6 WHERE Col7 = 1 AND Col1 = date '"&TEXT(Stats!B16,"YYYY-MM-DD")&"'")

date is just a number. try:
=QUERY(Calendar!A:K, "select H,I,J,K where F = "&B16*1, )

Related

How to deal with month grouping and sum of hours of these months in Google Sheets?

I'm having trouble filtering a column by month/year and counting the unique values. I started trying with ARRAYFORMULA, then with QUERY, but without success.
A
B
C
D
E
F
G
Date
Start Time
End Time
Duration
Month
Worked Days
Total Duration
01/06/2022
05:06
08:56
3h50min
06/2022
9 days
31h47min
02/06/2022
05:08
08:43
3h35min
07/2022
5 days
24h36min
02/06/2022
15:25
16:57
1h32min
03/06/2022
05:13
08:24
3h11min
04/06/2022
05:11
09:24
4h13min
06/06/2022
13:05
14:36
1h31min
07/06/2022
05:20
08:27
3h07min
08/06/2022
05:08
08:52
3h44min
09/06/2022
05:09
09:17
4h08min
10/06/2022
05:11
08:07
2h56min
01/07/2022
05:10
09:43
4h33min
02/07/2022
05:23
07:43
2h20min
04/07/2022
05:08
07:41
2h33min
04/07/2022
20:57
21:59
1h02min
05/07/2022
05:13
09:54
4h41min
06/07/2022
05:10
09:38
4h28min
06/07/2022
15:11
18:05
2h54min
06/07/2022
20:00
22:05
2h05min
Columns from A to D is what I have. Columns from E to G is what I expect.
One of the problems is that sometimes we have the day being repeated.
try:
=ARRAYFORMULA(QUERY({TEXT(A3:A; "mm/e")\
IF(COUNTIFS(A3:A; A3:A; ROW(A3:A); "<="&ROW(A3:A))=1; 1; 0)\ C3:C-B3:B};
"select Col1,sum(Col2),sum(Col3) where Col3>0
group by Col1 label sum(Col2)'',sum(Col3)''
format sum(Col3)'[h]\hmm\min'"))

How to average of last days data in influxdb

I want to calculate the average of unit_price from entries of the last known day from my influxdb database.
Below shows you the last two days data I have (14 entries per day)
I have 5 different days data in total.
> select * from "variable" order by time desc limit 28
name: variable
time area_code area_name unit_price
---- --------- --------- ----------
2021-05-11T23:00:00Z P Northern_Scotland 18.4695
2021-05-11T23:00:00Z N Southern_Scotland 17.598
2021-05-11T23:00:00Z M Yorkshire 16.968
2021-05-11T23:00:00Z L South_Western_England 18.6795
2021-05-11T23:00:00Z K Southern_Wales 18.081
2021-05-11T23:00:00Z J South_Eastern_England 18.501
2021-05-11T23:00:00Z H Southern_England 17.5875
2021-05-11T23:00:00Z G North_Western_England 17.4615
2021-05-11T23:00:00Z F North_Eastern_England 17.262
2021-05-11T23:00:00Z E West_Midlands 17.6085
2021-05-11T23:00:00Z D Merseyside_and_Northern_Wales 19.4355
2021-05-11T23:00:00Z C London 17.4405
2021-05-11T23:00:00Z B East_Midlands 17.3565
2021-05-11T23:00:00Z A Eastern_England 17.871
2020-11-01T00:00:00Z P Northern_Scotland 17.073
2020-11-01T00:00:00Z N Southern_Scotland 16.2225
2020-11-01T00:00:00Z M Yorkshire 15.6135
2020-11-01T00:00:00Z L South_Western_England 17.094
2020-11-01T00:00:00Z K Southern_Wales 16.527
2020-11-01T00:00:00Z J South_Eastern_England 16.8945
2020-11-01T00:00:00Z H Southern_England 16.128
2020-11-01T00:00:00Z G North_Western_England 16.0125
2020-11-01T00:00:00Z F North_Eastern_England 15.7395
2020-11-01T00:00:00Z E West_Midlands 16.086
2020-11-01T00:00:00Z D Merseyside_and_Northern_Wales 17.8605
2020-11-01T00:00:00Z C London 15.897
2020-11-01T00:00:00Z B East_Midlands 15.8445
2020-11-01T00:00:00Z A Eastern_England 16.2855
As you can see here, limit 14 average shows the same result as not using limit at all.
So this mean command is averaging 'all' the data, not any limited data.
select mean(unit_price) from "variable" order by time desc limit 14
name: variable
time mean
---- ----
1970-01-01T00:00:00Z 16.2924375
> select mean(unit_price) from "variable"
name: variable
time mean
---- ----
1970-01-01T00:00:00Z 16.2924375
>
I have tried nested selects, but can’t seem to find how to get an average of the final 14 entries (or from the final date with data)
Any help would be very much appreciated.
Think I may have solved with after playing some more with nested queries.
> select mean(unit_price) from "variable" group by time(1d) fill(none)
name: variable
time mean
---- ----
2019-04-12T00:00:00Z 15.572249999999997
2020-01-15T00:00:00Z 15.340499999999997
2020-11-01T00:00:00Z 16.377
2021-05-11T00:00:00Z 17.880000000000003
> select last("mean") from (select mean(unit_price) from "variable" group by time(1d) fill(none))
name: variable
time last
---- ----
2021-05-11T00:00:00Z 17.880000000000003
>

Speed up filter and sum results based on multiple criteria?

I am filtering then summing transaction data based on a date range and if a column contains one of multiple possible values.
example data
A | B | C | D
-----------|-----|---------------------------------------------------|-------
11/12/2017 | POS | 6443 09DEC17 C , ALDI 84 773 , OFFERTON GB | -3.87
18/12/2017 | POS | 6443 16DEC17 C , CO-OP GROUP 108144, STOCKPORT GB | -6.24
02/01/2018 | POS | 6443 01JAN18 , AXA INSURANCE , 0330 024 1229 GB | -220.10
I'm currently have the following formula, that works but is really quite slow.
=sum(
iferror(
filter(
Transactions!$D:$D,
Transactions!$A:$A>=date(A2,B2,1),
Transactions!$A:$A<=date(A2,B2,31),
regexmatch(Transactions!$C:$C, "ALDI|LIDL|CO-OP GROUP 108144|SPAR|SAINSBURYS S|SAINSBURY'S S|TESCO STORES|MORRISON|MARKS AND SPENCER , HAZEL GROVE|HAZELDINES|ASDA")
)
,0
)
) * -1
The formula is on a seperate sheet that is just a simple view of the results breakdown for each month of a year
| A | B | C
--|------|----|----------
1 | 2017 | 12 | <formula> # December 2017
2 | 2017 | 11 | <formula> # November 2017
3 | 2017 | 10 | <formula> # October 2017
Is there a way to achieve this that would be more performant?
I tried using ArrayFormula and SUMIF which works for the string criteria but to add more criteria with SUMIFS for the date, it stops working.
I couldn't figure out a way to utilize INDEX and/or MATCH
=query(filter( {Transactions!$A:$A,
Transactions!$D:$D},
regexmatch(Transactions!$C:$C, "ALDI|LIDL|CO-OP GROUP 108144|SPAR|SAINSBURYS S|SAINSBURY'S S|TESCO STORES|MORRISON|MARKS AND SPENCER , HAZEL GROVE|HAZELDINES|ASDA")
), "select year(Col1), month(Col1)+1, -1*sum(Col2) group by year(Col1), month(Col1)+1", 0)
The result is a table like this:
year() sum(month()1()) sum
2017 11 3.87
2017 12 6.24
Add labels if needed. Sample query text with labels:
"select year(Col1), month(Col1)+1, -1*sum(Col2) group by year(Col1), month(Col1)+1 label year(Col1) 'Year', month(Col1)+1 'Month'"
The result:
Year Month sum
2017 11 3.87
2017 12 6.24
Explanations
the single formula report reduces the number of filter functions, so must work faster.
Used query syntax. more info here.

Ignoring "noise" in ANTLR4

I'd like to build a natural language date parser in ANTLR4 and got stuck on ignoring "noise" input. The simplified grammar below parses any string that contains valid dates in the format DATE MONTH:
dates
: simple_date dates
| EOF
;
simple_date
: DATE MONTH
;
DATE : [0-9][0-9]?;
MONTH : January | February | March // etc.;
Text such as "1 January 22 February" will be accepted. I wanted the grammar accept other text as well, so I added ANY : . -> skip; at the end:
dates
: simple_date dates
| EOF
;
simple_date
: DATE MONTH
;
DATE : [0-9][0-9]?;
MONTH : January | February | March // etc.;
ANY : . -> skip;
This doesn't quite do what I want, however. While string such as "On 1 January and 22 February" is accepted and the simple_date rule is matched twice, string "On 1XX January" will also match the rule.
Question: How do I build a grammar where rules are matched only with the exact token sequence while ignoring all other input, including tokens in an order not defined in any of the rules? Consider the following cases:
"From 1 January to 2 February" -> simple_date matches "1 January" and "2 February"
"From 1XX January to 2 February" -> simple_date matches "2 February", rest is ignored
"From January to February" -> no match, everything ignored
Do not drop extra "noise" in lexer such as your ANY rule. Lexer does not know under what context the current token is. And what you want is "dropping some noise tokens when it is not of the form DATE MONTH". Move your ANY rule to parser rules that match the noise.
Also, it's advisable to drop white spaces IN THE LEXER. But in that case, your ANY rule should exclude those matched by the WS rule. Also pay attention that your DATE rule intercepted a noise token of the form [0-9][0-9]?
dates
: (noise* (simple_date) noise*)+
;
simple_date
: DATE MONTH
;
noise: (DATE|ANY);
DATE : [0-9][0-9]?;
MONTH : 'January' | 'February' | 'March' ;
ANY : ~(' '|'\t' | '\f')+ ;
WS : [ \t\f]+ -> skip;
Accepts:
1 January and 22 February noise 33
1 January and 22 February 3
Rejects:
1xx January
This wasn't fully tested. Also your MONTH lexer rule also intercepted a standalone month literal (e.g. January) which is considered a noise but not handled in my grammar e.g.
22 February January

Is col X functionally dependent of col y?

I am trying to understand database normalisation. I saw this example of 2 Normal form which is not 3 normal forms
Tournament Year Winner Winner_Date_of_Birth
Indiana Invitational 1998 Al Fredrickson 21 July 1975
Cleveland Open 1999 Bob Albertson 28 September 1968
Des Moines Masters 1999 Al Fredrickson 21 July 1975
Indiana Invitational 1999 Chip Masterson 14 March 1977
Here the primary key is Tournament, Year. So no non primary key attribute is Functionally dependent on subset of primary, it is in 2NF.
How, acc to wikipedia, it is not in 3 NF because
Touranment, Year -> Winner and
Winner -> Winner_Date_Of_Birth
So there is a transitive property of Functional Dependency among keys. I understand this part, but what I would like to know is that, Since for our key
(Tournament,Year) there can only be one unique winner_date_of_birth, is it right to say that ( Touranment, Year ) -> Winner_Date_Of_Birth without using the transitive property above?
Yes, transitive means that you can derive A -> C from A -> B and B -> C.

Resources