I am looking for a solution on either Google sheets or app script to check for overlapping dates for the same account. There will be multiple accounts and the dates won't be in any particular order. Here is an example below. I am trying to achieve the right column "check" with some formula or automation. Any suggestions would be greatly appreciated.
Start Date
End Date
Account No.
Check
2023-01-01
2023-01-02
123
ERROR
2023-01-02
2023-01-05
123
ERROR
2023-02-25
2023-02-27
456
OK
2023-01-11
2023-01-12
456
OK
2023-01-01
2023-01-15
789
ERROR
2023-01-04
2023-01-07
789
ERROR
2023-01-01
2023-01-10
012
OK
2023-01-15
2023-01-20
012
OK
I also found some similar past questions, but they don't have the "for the same account" component and/or requires some sort of chronological order, which my sheet will not have.
How to calculate the overlap between some Google Sheet time frames?
How to check if any of the time ranges overlap with each other in Google Sheets
Another approach (to be entered in D2):
=arrayformula(lambda(last_row,
lambda(acc_no,start_date,end_date,
if(isnumber(match(acc_no,unique(query(query(split(flatten(acc_no&"|"&split(map(start_date,end_date,lambda(start_date,end_date,join("|",sequence(1,end_date-(start_date-1),start_date)))),"|")),"|"),"select Col1,count(Col2) where Col2 is not null group by Col1,Col2",0),"select Col1 where Col2>1",1)),0)),"ERROR","OK"))(
C2:index(C2:C,last_row),A2:index(A2:A,last_row),B2:index(B2:B,last_row)))(
counta(A2:A)))
Briefly, we are creating a sequence of dateserial numbers between the start & end dates for each row, doing some string manipulation to turn it into a table of account number against each date, then QUERYing it to get each account number which has dateserials with count>1 (i.e. overlaps), using UNIQUE to get the distinct list of those account numbers, then finally matching this list against the original list of account numbers to give the ERROR/OK output.
(1) Here is one way, considering each case which could result in an overlap separately:
=ArrayFormula(if(A2:A="",,
if((countifs(A2:A,"<="&A2:A,B2:B,">="&A2:A,C2:C,C2:C,row(A2:A),"<>"&row(A2:A))
+countifs(A2:A,"<="&B2:B,B2:B,">="&B2:B,C2:C,C2:C,row(A2:A),"<>"&row(A2:A))
+countifs(A2:A,">="&A2:A,B2:B,"<="&B2:B,C2:C,C2:C,row(A2:A),"<>"&row(A2:A))
)>0,"ERROR","OK")
)
)
(2) Here is the method using the Overlap formula
min(end1,end2)-max(start1,start2)+1
which results in
=ArrayFormula(if(byrow(A2:index(C:C,counta(A:A)),lambda(r,sum(text(if(index(r,2)<B2:B,index(r,2),B2:B)-if(index(r,1)>A2:A,index(r,1),A2:A)+1,"0;\0;\0")*(C2:C=index(r,3))*(row(A2:A)<>row(r)))))>0,"ERROR","OK"))
(3) Most efficient is to use the original method of comparing previous and next dates, but then you need to sort and sort back like this:
=lambda(data,sort(map(sequence(rows(data)),lambda(c,if(if(c=1,0,(index(data,c-1,2)>=index(data,c,1))*(index(data,c-1,3)=index(data,c,3)))+if(c=rows(data),0,(index(data,c+1,1)<=index(data,c,2))*(index(data,c+1,3)=index(data,c,3)))>0,"ERROR","OK"))),index(data,0,4),1))(SORT(filter({A2:C,row(A2:A)},A2:A<>""),3,1,1,1))
HOWEVER, this only checks for local overlaps. not globally. You can see what I mean if you change the dataset slightly:
Clearly the first and third pair of dates have an overlap but G4 contains "OK". This is because each pair of dates is only checked against the adjacent pairs of dates. This also applies to the original reference cited by OP - here's an example where it would give a similar result:
The formula posted by #The God of Biscuits gives the correct (global) result :-)
I have a binary classification problem and need to prepare the data for model training. There are two classes, duplicate, and nonduplicate. Assume two records of the data is like
Id
Name
Phone
Email
City
A1
Mick
12345
m#m.com
London
A2
Mick
12345
null
London
It seems that these two records are duplicates. I need to turn them in one record and assign each feature a binary value of 1 if their values match; otherwise, a 0 as follows
Id1
Id2
Name
Phone
Email
City
Label
A1
A2
1
1
?
1
1
As the first table shows, we have a missing value for the email in the second row. I know I cannot compare a known value with a missing one. The question is, what is the best practice in this case?
Note: The number of missing values is high in my dataset, and I cannot drop them.
I tried to put 0, but I know it introduces bias in the dataset.
you can drop the records wit the null values
to do this use
Pandas dropna()
Since Tableau does not have a function for P-values(correct me if I'm wrong here) I created a spreadsheet with all possible sample sizes under two different alphas/significance levels and need to connect the appropriate p-value to a calculated field from the main database source (aggregate count of people). I assumed I could easily match numbers with a condition to bring back the p-value in a calculated field yet I'm hitting a brick wall. Biggest issue seems to be that the field I want to join the P-value reference table to is an aggregated integer. Also, I do not have any extensions and my end result needs to be an integer, not a graph.
Any secret tricks here?
Seems I cannot blend the reference table in nor join it to an aggregate?
Thanks!
I found a work around in calculating the critical value for a two tailed t-test in tableau. However, I didn't figure out how to join based on an aggregated calculated field. Work around: I used a conditional statement just copying and pasting about 100 critical values based on (sample size - 2) aka degrees of freedom, into a calculated field. To save time, use excel to pull down the conditions to 120. Worked like a charm!
Here is the conditional logic for alpha = .2 (80%) in two tailed t-test (replace the ## line with about 117 rows):
IF [degrees of freedom] = 1 THEN 3.08
ELSEIF [degrees of freedom] = 2 THEN 1.89
ELSEIF [degrees of freedom] = 3 THEN 1.64
##ELSEIF [...calculate down to 120] = ... then ...
ELSEIF [degrees of freedom] > 121 THEN 1.28
END
Could not find a suitable solution, hence this post.
Have 2 sheets - Attendance & Payroll where attendance is filled in a pivoted manner (see sample).
For a given date range, I want to count the number of "Absent" days for the staff. The Non-Array-Formula (in Payroll column "Absent") below does that. Note: column A with staff ids is a dynamic list even though its fixed in the sample.
How this formula works:
match the payroll-staffid to the attendance column-header-staffid
using MATCH
date range given in cells payroll B1,B2
Settings!$B$13 contains the columnar range as per (2)
OFFSET (3) by MATCH to get the staff attendance
COUNTIF the number of "Absent" entries in staff attendance range - CORRECT
ArrayFormula does NOT work when the payroll-staffid "A5" is changed to "A5:A15"
Note: there is no guarantee that payroll-staffids order and attendence-header-staffids are both in same order -> that's why each staffid is mapped MATCHed and OFFSET.
=COUNTIF(OFFSET(INDIRECT(Settings!$B$13),0,MATCH(A5,Attendance!$B$1:$1,FALSE)),"Absent")
Sample sheet here.
=ArrayFormula(VLOOKUP(A5:A15, TRANSPOSE({INDIRECT(AttHeader,FALSE);MMULT(TRANSPOSE(SIGN(ROW(INDIRECT(AttUnitMatrix)))),IF(INDIRECT(AttData,FALSE)="Absent",1,0))}),2,FALSE))
See linked sample sheet in OP.
For defined names; see the Settings sheet. All ranges are computed separately to reduce the size of the formula.
1) Start operating in "block mode", ignoring order of staff-ids. "AttData" is the string representation of the data block and mapped to 1 if "Absent" else 0.
IF(INDIRECT(AttData,FALSE)="Absent",1,0)
2) This matrix is multiplied by a unit row matrix from range string "AttUnitMatrix"
TRANSPOSE(SIGN(ROW(INDIRECT(AttUnitMatrix))))
3) MMULT returns a row of "Absent" counts
4) { } is used to prepend the staff-ids to the "Absent" counts for a 2 row matrix.
{INDIRECT(AttHeader,FALSE);MMULT(...)}
5) TRANSPOSE result to be accessed by VLOOKUP (2 column matrix)
6) VLOOKUP takes care of out of order staff-ids by matching the key-staff-ids to the generated row matrix of (staff-id / absent-count) pairs.
fireworks ... pat on my back :)
In this case and others, and I've sent feedback to Google about this, a feature request "Named Formulas" akin to "Named Ranges", to be used in standard formulas. This is WITHOUT resorting to GAS. When formulas become large, this is NOT a luxury, but a NECESSITY. If readers find such a feature useful, please send feedback to Google.
eg: UnitMatrix($1) => TRANSPOSE(SIGN(ROW(INDIRECT($1))))
MMULT(UnitMatrix(AttUnitMatrix),IF(INDIRECT(AttData,FALSE)="Absent",1,0))
Thank you ahead of time for anyone who can help me with this, I think I am close, but it still isn't working.
I have a simple sheet activity reporting sheet that I am asking staff to complete over the upcoming year - It has 5 columns:
Column A: Date -In format (4/4/2013 13:30:00)
Column B: Title -In format (text string)
Column C: Attendance -In format (Numbers)V
Column D: Vol led - In format (text string)
Column E: Staff Led - In format (text string)
Using this data I am 90 % positive that I can aggregate on a different summary sheet that contains some static data like months (in the B column) to aggregate on. I am having trouble configuring the criteria in the filters though to cause the correct output to either sum or count .
Quantity of events ed by either staff or vol, if neither box is checked the event should not be counted) Right now I am trying this but it is not working
=SUM(FILTER('Hostel Activities'!A:A,MONTH('Hostel Activities'!A:A)=$B3, NOT(AND(ISBLANK('Hostel Activities'!D:D),ISBLANK('Hostel Activities'!E:E)))
Total number of attendance in a month for activities led by staff or volunteers Right now I am trying this but it is not working
=SUM(FILTER('Hostel Activities'!A:A,MONTH('Hostel Activities'!A:A)=$B3, NOT(AND(ISBLANK('Hostel Activities'!D:D),ISBLANK('Hostel Activities'!E:E)))
THIE WORKS! ## Heading ##Total number of volunteer led activities in a month for activities Right now I am using this and it IS working
=COUNT(FILTER('Hostel Activities'!A:A,month('Hostel Activities'!A:A)=B3,not(isblank('Hostel Activities'!E:E))))
Thank you for any assistance and/or guidance
Danny
The first problem I see with your first two formulas is that you're calling SUM on your FILTER result. But the FILTER is returning the column A, which are dates. So, your basically summing dates, which will surely not yield the result you're looking for. Why are you not using COUNT, as you did on your last formula?
Second, the first two formulas you pasted are identical, how do you expect them to return different results?
It seems that for the first two want to sum an OR condition. You can do this two ways (that I can think of now). The simpler to understand is just to sum two COUNT(FILTER(... formulas, one for each criteria, e.g.
=COUNT(FILTER('Hostel Activities'!A:A,month('Hostel Activities'!A:A)=B3,not(isblank('Hostel Activities'!D:D)))) + B6
Assuming that on B6 is the other COUNT formula (the 3rd one, that already works).
Another option would be to use an OR function as criteria for the FILTER. Like this:
=COUNT(FILTER('Hostel Activities'!A:A,month('Hostel Activities'!A:A)=B3, OR(NOT(ISBLANK('Hostel Activities'!E:E)), NOT(ISBLANK('Hostel Activities1!D:D))) ))
I believe I have figured out a method that works by making some adjustments in the formulas and the source data.
Basically
IN THE SOURCE REPORTING DATA:
I combined columns D and E into the same column and added data validation so the coordinator has to enter if the activity is led by staff,volunteer, or neither.
IN THE MONTHLY AGGREGATION REPORT:
To count the number of activities led by either staff or volunteers I used this :
=COUNT(FILTER('Hostel Activities'!A:A,month('Hostel Activities'!A:A)=B3,'Hostel Activities'!D:D="Staff"))+E3
*E3 is the count of volunteer led activities which is found using this formula:
=COUNT(FILTER('Hostel Activities'!A:A,month('Hostel Activities'!A:A)=B3,'Hostel Activities'!D:D="Volunteer"))
Adding up the number of participants in activities run by either staff or volunteers was a little more difficult, but I was able to do it by adding up 2 unique equations. I would prefer using an OR statement in the filter criteria, but I just couldn't get that to work. This is how I was able to make it happen:
=SUM(FILTER('Hostel Activities'!C:C,month('Hostel Activities'!A:A)=B3,'Hostel Activities'!D:D="Staff")) + SUM(FILTER('Hostel Activities'!C:C,month('Hostel Activities'!A:A)=B3,'Hostel Activities'!D:D="Volunteer"))
Thank you all for your assistance