I am looking for a solution on either Google sheets or app script to check for overlapping dates for the same account. There will be multiple accounts and the dates won't be in any particular order. Here is an example below. I am trying to achieve the right column "check" with some formula or automation. Any suggestions would be greatly appreciated.
Start Date
End Date
Account No.
Check
2023-01-01
2023-01-02
123
ERROR
2023-01-02
2023-01-05
123
ERROR
2023-02-25
2023-02-27
456
OK
2023-01-11
2023-01-12
456
OK
2023-01-01
2023-01-15
789
ERROR
2023-01-04
2023-01-07
789
ERROR
2023-01-01
2023-01-10
012
OK
2023-01-15
2023-01-20
012
OK
I also found some similar past questions, but they don't have the "for the same account" component and/or requires some sort of chronological order, which my sheet will not have.
How to calculate the overlap between some Google Sheet time frames?
How to check if any of the time ranges overlap with each other in Google Sheets
Another approach (to be entered in D2):
=arrayformula(lambda(last_row,
lambda(acc_no,start_date,end_date,
if(isnumber(match(acc_no,unique(query(query(split(flatten(acc_no&"|"&split(map(start_date,end_date,lambda(start_date,end_date,join("|",sequence(1,end_date-(start_date-1),start_date)))),"|")),"|"),"select Col1,count(Col2) where Col2 is not null group by Col1,Col2",0),"select Col1 where Col2>1",1)),0)),"ERROR","OK"))(
C2:index(C2:C,last_row),A2:index(A2:A,last_row),B2:index(B2:B,last_row)))(
counta(A2:A)))
Briefly, we are creating a sequence of dateserial numbers between the start & end dates for each row, doing some string manipulation to turn it into a table of account number against each date, then QUERYing it to get each account number which has dateserials with count>1 (i.e. overlaps), using UNIQUE to get the distinct list of those account numbers, then finally matching this list against the original list of account numbers to give the ERROR/OK output.
(1) Here is one way, considering each case which could result in an overlap separately:
=ArrayFormula(if(A2:A="",,
if((countifs(A2:A,"<="&A2:A,B2:B,">="&A2:A,C2:C,C2:C,row(A2:A),"<>"&row(A2:A))
+countifs(A2:A,"<="&B2:B,B2:B,">="&B2:B,C2:C,C2:C,row(A2:A),"<>"&row(A2:A))
+countifs(A2:A,">="&A2:A,B2:B,"<="&B2:B,C2:C,C2:C,row(A2:A),"<>"&row(A2:A))
)>0,"ERROR","OK")
)
)
(2) Here is the method using the Overlap formula
min(end1,end2)-max(start1,start2)+1
which results in
=ArrayFormula(if(byrow(A2:index(C:C,counta(A:A)),lambda(r,sum(text(if(index(r,2)<B2:B,index(r,2),B2:B)-if(index(r,1)>A2:A,index(r,1),A2:A)+1,"0;\0;\0")*(C2:C=index(r,3))*(row(A2:A)<>row(r)))))>0,"ERROR","OK"))
(3) Most efficient is to use the original method of comparing previous and next dates, but then you need to sort and sort back like this:
=lambda(data,sort(map(sequence(rows(data)),lambda(c,if(if(c=1,0,(index(data,c-1,2)>=index(data,c,1))*(index(data,c-1,3)=index(data,c,3)))+if(c=rows(data),0,(index(data,c+1,1)<=index(data,c,2))*(index(data,c+1,3)=index(data,c,3)))>0,"ERROR","OK"))),index(data,0,4),1))(SORT(filter({A2:C,row(A2:A)},A2:A<>""),3,1,1,1))
HOWEVER, this only checks for local overlaps. not globally. You can see what I mean if you change the dataset slightly:
Clearly the first and third pair of dates have an overlap but G4 contains "OK". This is because each pair of dates is only checked against the adjacent pairs of dates. This also applies to the original reference cited by OP - here's an example where it would give a similar result:
The formula posted by #The God of Biscuits gives the correct (global) result :-)
I have a binary classification problem and need to prepare the data for model training. There are two classes, duplicate, and nonduplicate. Assume two records of the data is like
Id
Name
Phone
Email
City
A1
Mick
12345
m#m.com
London
A2
Mick
12345
null
London
It seems that these two records are duplicates. I need to turn them in one record and assign each feature a binary value of 1 if their values match; otherwise, a 0 as follows
Id1
Id2
Name
Phone
Email
City
Label
A1
A2
1
1
?
1
1
As the first table shows, we have a missing value for the email in the second row. I know I cannot compare a known value with a missing one. The question is, what is the best practice in this case?
Note: The number of missing values is high in my dataset, and I cannot drop them.
I tried to put 0, but I know it introduces bias in the dataset.
you can drop the records wit the null values
to do this use
Pandas dropna()
It's not hard to do this with custom function, but I'm wondering if there is a way to do it using a formula. Because datas won't automatically update when using custom function.
So I have a course list sheet, each with a price. And I'm using google form to let users choose what courses they will take. Users are allowed to take multiple courses, so how many they will take is unknown.
Now in the response sheet, I have datas like
Order ID
User ID
Courses
Total
1001
38
courseA, courseC
What formula to put here?
1002
44
courseB, courseC, courseD
What formula to put here?
1003
55
courseE
What formula to put here?
and the course sheet is like
course
Price
A
23
B
33
C
44
D
23
E
55
I want to output the total for each order and am looking at using FILTER to do this. Firstly I can get a range of unknown length for the chosen courses
=SPLIT(courses, ",") // having named the Courses column as "courses"
Now I need to filter this range against the course sheet? not quite sure how to do it or even if it is possible. Any hint is appreicated.
try:
=ARRAYFORMULA(IF(A2:A="",,MMULT(IFERROR(
VLOOKUP(SPLIT(C2:C, ", "), {F1&F2:F, G2:G}, 2, 0))*1,
ROW(INDIRECT("1:"&COLUMNS(SPLIT(C2:C, ", "))))^0)))
demo spreadsheet
As I need time to digest #player0's answer, I am doing this in a more intuitive way.
I create 2 sheets to store intermediate values.
The first one is named "chosen_courses"
Order ID
User ID
1001
=IFERROR(ARRAYFORMULA(TRIM(SPLIT(index(courses,Row(),1),","))),"")
1002
=IFERROR(ARRAYFORMULA(TRIM(SPLIT(index(courses,Row(),1),","))),"")
1003
=IFERROR(ARRAYFORMULA(TRIM(SPLIT(index(courses,Row(),1),","))),"")
In this sheet every row is a horizontal list of the chosen courses, and I created another sheet
total
course price
=IF(isblank(order_id),"",SUM(B2:2))
=IFERROR(VLOOKUP('chosen_courses'!B2,{course_Names,course_price},2,false),"")
=IF(isblank(order_id),"",SUM(C2:2))
=IFERROR(VLOOKUP('chosen_courses'!B2,{course_Names,course_price},2,false),"")
=IF(isblank(order_id),"",SUM(D2:2))
=IFERROR(VLOOKUP('chosen_courses'!B2,{course_Names,course_price},2,false),"")
course_Names,order_id and course_price are named ranges.
This works well, at least for now.
But there is a problem:
I have 20 courses, so in the 2nd sheed, there are 21 columns. And I copy the formulas to 1000 rows because that is the maximum rows you can get to using ctrl+shift+↓ and ctrl+D. Now sometimes when I open the sheet, there will be a progress bar calculating formulas in this sheet, which could take around 2 mins, even though I have only like 5 testing orders in the sheet. I am afraid this will get worse when I have more datas or when it is open by old computers.
Is it because I use some resource consuming functions? Can it be improved?
I have a big excel file containing all data I have since 2 years. But to make comparison, there is a field called IAP which is a number that defines a set of rows into single number according to specific date
Per example, all data collected in December 2016 have an IAP of 34, Data collected in January 2017 is 35, and so on.
In this table I need to compare points from different IAP, that was good and turn to be bad in the last IAP (In this case the IAP 37)
Again, if the Status of a point in IAP 34 is good, and turn to be bad in IAP 37, should be counted.
By creating some excel filters and using VLOOKUPS to count that criteria (again points that was good and turn bad in the last added IAP), the result was 100, but in the QlikView the result displayed in a text object is 60.
Here is the script usign Left Join, in the script editor first:
NewlyBad:
Load
Code As Code1 ,
Status as NewStatus ,
New_Sites as NewNew_Sites,
IAP as IAMP_NAME
Resident ALLIAP
where IAP_Version = $(vMaxIAP) ;
Left Join
Load Code as Code1 ,
Status as OldStatus,
IAP as LAST_IAP
Resident ALLIAP
where IAP_Version = $(vMaxIAMP)-3;
I used this line where IAP_Version = $(vMaxIAMP)-3; to compare data between the last IAP and the IAP-3.
In the variable Overview, I am getting the correct values: comparison between 34 and 37.
Now, the expression of the text object is:
=(count ({< NewStatus={"Bad"}, OldStatus-={"Bad"} >} distinct Code1))
The result displayed is 60 instead of 100, which wrong.
i have a table in tableau and i wanna create a calculated field based on a filter. Can someone help me in finding the solution for the following logical statement.
I have a table which contains some customers. It also contains some special customers say 'members'. Now I wanna show the total amount of billing for members customers and total amount for the remaining ones.
I have special range for these member customers ID. Ids lies between 300 and 399, both inclusive.
Please help. Thanks in advance
I assume that the measure you want to sum is the Customer Pay. Then, you can do this in three neat steps.
First Step: Create a calculated field called 'Customer Type'
IF [Customer Code] >= 300 AND [Customer Code] <= 399 THEN 'Member' ELSE 'Normal' END
Second Step: Create a calculated field called 'Member Total Billing Amount'
SUM(IF [Customer Type] = 'Member' THEN [Customer Pay] END)
Third Step: Create a calculated field called 'Non-member Total Billing Amount'
SUM(IF [Customer Type] = 'Normal' THEN [Customer Pay] END)
You can now drag Member Total Billing Amount and Non-member Total Billing Amount measures into the view as desired.
Note:
The calculated field called 'Customer Type' will be saved under Dimensions. So, go up there and look for it.
I have only named the calculated fields for illustration purposes. Feel free to change it to what is more intuitive for you.
Another thing to consider, depending on your needs, will be to add Customer Type on the Color Marks card and it will divide Customer Pay into 'Members' and 'Normal'.