I am looking for a solution on either Google sheets or app script to check for overlapping dates for the same account. There will be multiple accounts and the dates won't be in any particular order. Here is an example below. I am trying to achieve the right column "check" with some formula or automation. Any suggestions would be greatly appreciated.
Start Date
End Date
Account No.
Check
2023-01-01
2023-01-02
123
ERROR
2023-01-02
2023-01-05
123
ERROR
2023-02-25
2023-02-27
456
OK
2023-01-11
2023-01-12
456
OK
2023-01-01
2023-01-15
789
ERROR
2023-01-04
2023-01-07
789
ERROR
2023-01-01
2023-01-10
012
OK
2023-01-15
2023-01-20
012
OK
I also found some similar past questions, but they don't have the "for the same account" component and/or requires some sort of chronological order, which my sheet will not have.
How to calculate the overlap between some Google Sheet time frames?
How to check if any of the time ranges overlap with each other in Google Sheets
Another approach (to be entered in D2):
=arrayformula(lambda(last_row,
lambda(acc_no,start_date,end_date,
if(isnumber(match(acc_no,unique(query(query(split(flatten(acc_no&"|"&split(map(start_date,end_date,lambda(start_date,end_date,join("|",sequence(1,end_date-(start_date-1),start_date)))),"|")),"|"),"select Col1,count(Col2) where Col2 is not null group by Col1,Col2",0),"select Col1 where Col2>1",1)),0)),"ERROR","OK"))(
C2:index(C2:C,last_row),A2:index(A2:A,last_row),B2:index(B2:B,last_row)))(
counta(A2:A)))
Briefly, we are creating a sequence of dateserial numbers between the start & end dates for each row, doing some string manipulation to turn it into a table of account number against each date, then QUERYing it to get each account number which has dateserials with count>1 (i.e. overlaps), using UNIQUE to get the distinct list of those account numbers, then finally matching this list against the original list of account numbers to give the ERROR/OK output.
(1) Here is one way, considering each case which could result in an overlap separately:
=ArrayFormula(if(A2:A="",,
if((countifs(A2:A,"<="&A2:A,B2:B,">="&A2:A,C2:C,C2:C,row(A2:A),"<>"&row(A2:A))
+countifs(A2:A,"<="&B2:B,B2:B,">="&B2:B,C2:C,C2:C,row(A2:A),"<>"&row(A2:A))
+countifs(A2:A,">="&A2:A,B2:B,"<="&B2:B,C2:C,C2:C,row(A2:A),"<>"&row(A2:A))
)>0,"ERROR","OK")
)
)
(2) Here is the method using the Overlap formula
min(end1,end2)-max(start1,start2)+1
which results in
=ArrayFormula(if(byrow(A2:index(C:C,counta(A:A)),lambda(r,sum(text(if(index(r,2)<B2:B,index(r,2),B2:B)-if(index(r,1)>A2:A,index(r,1),A2:A)+1,"0;\0;\0")*(C2:C=index(r,3))*(row(A2:A)<>row(r)))))>0,"ERROR","OK"))
(3) Most efficient is to use the original method of comparing previous and next dates, but then you need to sort and sort back like this:
=lambda(data,sort(map(sequence(rows(data)),lambda(c,if(if(c=1,0,(index(data,c-1,2)>=index(data,c,1))*(index(data,c-1,3)=index(data,c,3)))+if(c=rows(data),0,(index(data,c+1,1)<=index(data,c,2))*(index(data,c+1,3)=index(data,c,3)))>0,"ERROR","OK"))),index(data,0,4),1))(SORT(filter({A2:C,row(A2:A)},A2:A<>""),3,1,1,1))
HOWEVER, this only checks for local overlaps. not globally. You can see what I mean if you change the dataset slightly:
Clearly the first and third pair of dates have an overlap but G4 contains "OK". This is because each pair of dates is only checked against the adjacent pairs of dates. This also applies to the original reference cited by OP - here's an example where it would give a similar result:
The formula posted by #The God of Biscuits gives the correct (global) result :-)
I have a binary classification problem and need to prepare the data for model training. There are two classes, duplicate, and nonduplicate. Assume two records of the data is like
Id
Name
Phone
Email
City
A1
Mick
12345
m#m.com
London
A2
Mick
12345
null
London
It seems that these two records are duplicates. I need to turn them in one record and assign each feature a binary value of 1 if their values match; otherwise, a 0 as follows
Id1
Id2
Name
Phone
Email
City
Label
A1
A2
1
1
?
1
1
As the first table shows, we have a missing value for the email in the second row. I know I cannot compare a known value with a missing one. The question is, what is the best practice in this case?
Note: The number of missing values is high in my dataset, and I cannot drop them.
I tried to put 0, but I know it introduces bias in the dataset.
you can drop the records wit the null values
to do this use
Pandas dropna()
I have a call report with two different tabs
TAB 1) "RAWDATA" - Range:A:AB -
a log of two employees´ handled outbound and inbound calls and more columns that are not relevant now.
IMPORTANT COLUMNS:
"D" - SKILL NAME.
Type of call). for us INBOUND would be "RVL_US_Scheduling_PC" and OUTBOUND would be either "RVL_US_Scheduling_OB_PC" or "RVL_US_SchedulingExpress_OB_PC
"I" - CONTACT AGENT NAME
The corresponding column with the employees name in our system
"W" - HANDLE TIME
Te report will inform how long each call lasted which leaves us with the need to only extract only calls greater than 2 MIN
ADDITONAL COLUMN
For OUTBOUND column "V" can also be considered.
TAB 2) "CALLS BY AGENT" - Range: A:E -
This is the tab where I would like to "query" to:
NUMBER OF CALLS HANDLED BY AGENT AND BY TYPE (INBOUND AND OUTBOUND) separately on columns: Kevin: C3 and C4, and Sandra: D3 and D4.
So far I've partially come up with one formula for "inbound":
=QUERY(RAWDATA!A2:W, "select COUNT(D) where I='Aldana, Kevin, Mejia' AND D='RVL_US_Scheduling_PC'")
Unfortunately, when I try to use another "AND" function as follows:
AND W>2:00:00
to limit the query to retrieve only calls longer than 2 min, it´s marking Value Error!
PLEASE HELP ME, TEAM!
I have tried everything even transforming the "duration" (handle time) column to all "formats", and changing it to even proper numbers and change the formula to numeric value and it won't work!
Can anybody assist with the inbound and outbound call count formula for this case scenario? Here´s the file (permissions are open)
Use Filter instead and control the conditions with a dropdown list highlighted in yellow. and counta the output.
=COUNTA(FILTER(RAWDATA!$D$2:$D,RAWDATA!$D$2:$D=B3,RAWDATA!$X$2:$X>$F$3,RAWDATA!I2:I=C$1))
I have a dataset in Google Sheets that records updates to projects over time:
Update_Date Project_Code Status
01/09/21 0001 Proposal
01/09/21 0002 Delivery
01/09/21 0003 Business Case
01/10/21 0001 Business Case
01/10/21 0002 Delivery
01/10/21 0003 Delivery
I am using this data as a Data Source in Google Data Studio. Is it possible to produce a count of the number of projects that have moved between Status values over time? For example, for the update on 01/10/21, there is one project that has moved from Proposal to Business Case (0001).
I have tried to do this by creating a field and using COUNT_DISTINCT(CASE WHEN Update_Date = 01/09/21 and Status="Proposal" and Update_Date=01/10/21 and Status="Business Case" THEN Project_Code ELSE NULL END) but I get an incorrect value of 0, which I suspect is because I am referencing the same two variables twice in the one formula.
This was solved by converting each combination of values to a number in a separate field x:
CASE
WHEN Update_Date=01/09/21 and Status="Proposal" THEN 1
WHEN Update_Date=01/10/21 and Status="Business Case" THEN 2
ELSE 0
END
then blending this data (using the automatic SUM aggregation) with a distinct count of the Project_Code field and finally filtering the result for those Project_Code values where x is equal to 3
I have a big excel file containing all data I have since 2 years. But to make comparison, there is a field called IAP which is a number that defines a set of rows into single number according to specific date
Per example, all data collected in December 2016 have an IAP of 34, Data collected in January 2017 is 35, and so on.
In this table I need to compare points from different IAP, that was good and turn to be bad in the last IAP (In this case the IAP 37)
Again, if the Status of a point in IAP 34 is good, and turn to be bad in IAP 37, should be counted.
By creating some excel filters and using VLOOKUPS to count that criteria (again points that was good and turn bad in the last added IAP), the result was 100, but in the QlikView the result displayed in a text object is 60.
Here is the script usign Left Join, in the script editor first:
NewlyBad:
Load
Code As Code1 ,
Status as NewStatus ,
New_Sites as NewNew_Sites,
IAP as IAMP_NAME
Resident ALLIAP
where IAP_Version = $(vMaxIAP) ;
Left Join
Load Code as Code1 ,
Status as OldStatus,
IAP as LAST_IAP
Resident ALLIAP
where IAP_Version = $(vMaxIAMP)-3;
I used this line where IAP_Version = $(vMaxIAMP)-3; to compare data between the last IAP and the IAP-3.
In the variable Overview, I am getting the correct values: comparison between 34 and 37.
Now, the expression of the text object is:
=(count ({< NewStatus={"Bad"}, OldStatus-={"Bad"} >} distinct Code1))
The result displayed is 60 instead of 100, which wrong.