TimeArray Dates Issue - time-series

I would like to ask if anyone knows how to get the Dates from a TimeArray table eg. 36x1 TimeArray{Float64,1} 1980-12-31 to 2015-01-01
1980-12-31 | 0.94
1981-12-31 | 0.37
1982-12-31 | 0.12
1983-12-31 | 0.64
⋮
2012-12-31 | 0.43
2013-12-31 | 0.81
2014-12-31 | 0.88
2015-01-01 | 0.55

If you read this table into a matrix x where the dates are in the first column, then this follows a pattern from the manual http://docs.julialang.org/en/release-0.4/manual/dates/ :
df = Dates.DateFormat("y-m-d");
map(u -> Date(u,df), x[:,1])

I'm don't know where did those TimeArray come from, but because you mentioned that they are of Float64 type, I think the format might be UnixDateTime, so if it's true then you can convert them to julia DateTime as follow:
juliadatetime=[Dates.unix2datetime(t) for t in timearray]
and then extract what you want
ymd=[Dates.yearmonthday(t) for t in juliadatetime]

Related

Calculate time differences and sum duration

I am trying to create a small "app" using tasker on my android phone that am supposed to track my workhours and over/under-time. I have managed to get tasker to send timestamps on the start/end of each workday and are writing them to a google sheet so it gets recorded like:
<Not implemented> <Not implemented>
| A | B | C | D | E | F |
| 2020-01-29 | 07:24 | 16:33 | 00:09 | | -02:51 |
| 2020-01-30 | 07:00 | 12:00 | -03:00 | | |
Where the "D" column is the difference between ordinary workhours (8) and actually registred hours.
The "F" column should summarize the "D" column and show the sum of all values.
The data in the three first columns are beeing sent correctly but I cant figure out how to set up formulas so that the values for the "D" column is added and and same thing with the cell in the "F" column. I have been trying to change to different formats and tried creating my own formats to but do not understand how to get it to work.
I'm getting a different result than you in D1. I wonder if you're also accounting for a lunch hour (so subtract 9 instead of 8), but these formulas worked for me:
in Column D: =(C1-B1)-(8/24)
in Cell F1: =sum(D1:D2)
Column D and Cell F1 are formatted as Time > Duration.
Here's the result:

Google Sheets Formula to calculate actual total duration of tasks with different start/end dates, overlaps, and gaps

I know I how to do this using a custom function/script but I am wondering if it can be done with a built-in formula.
I have a list of tasks with a start date and end date. I want to calculate the actual # of working days (NETWORKDAYS) spent on all the tasks.
Task days may overlap so I can't just total the # of days spent on each task
There may be gaps between tasks so I can't just find the difference between the first start and last end.
For example, let's use these:
| Task Name | Start Date | End Date | NETWORKDAYS |
|:---------:|------------|------------|:-----------:|
| A | 2019-09-02 | 2019-09-04 | 3 |
| B | 2019-09-03 | 2019-09-09 | 5 |
| C | 2019-09-12 | 2019-09-13 | 2 |
| D | 2019-09-16 | 2019-09-17 | 2 |
| E | 2019-09-19 | 2019-09-23 | 3 |
Here it is visually:
Now:
If you total the NETWORKDAYS you'll get 15
If you calculate NETWORKDAYS between 2019-09-02 and 2019-09-23 you get 16
But the actual duration is 13:
A and B overlap a bit
There is a gap between B and C
There is a gap between D and E
If I was to write a custom function I would basically take all the dates, sort them, find overlaps and remove them, and account for gaps.
But I am wondering if there is a way to calculate the actual duration using built-in formulas?
sure, why not:
=ARRAYFORMULA(COUNTA(IFERROR(QUERY(UNIQUE(TRANSPOSE(SPLIT(CONCATENATE("×"&
SPLIT(REPT(INDIRECT("B1:B"&COUNTA(B1:B))&"×",
NETWORKDAYS(INDIRECT("B1:B"&COUNTA(B1:B)), INDIRECT("C1:C"&COUNTA(B1:B)))), "×")+
TRANSPOSE(ROW(INDIRECT("A1:A"&MAX(NETWORKDAYS(B1:B, C1:C))))-1)), "×"))),
"where Col1>4000", 0))))

Find total duration of many overlapping times

I have a list of dates and times for employee time sheets. The times begin in column F, and end in column G. Sometimes there are overlapping times for projects. The employee does not get paid for overlapping projects, yet we need to track each project separately. I would like to be able to look at columns E, F and G and find any overlapping projects, and return a single time entry. In the example below, notice that line 1 does NOT overlap with the others, but that there is a series of overlapping entries in lines 2-6. They don't necessarily all overlap, but are more like a "chain." I want to write a formula (not a script) to solve this.
+---+------------+------------+----------+
| | E | F | G |
+---+------------+------------+----------+
| 1 | 10/11/2017 | 12:30 PM | 1:00 PM |
| 2 | 10/11/2017 | 1:00 PM | 3:00 PM |
| 3 | 10/11/2017 | 2:15 PM | 6:45 PM |
| 4 | 10/11/2017 | 2:30 PM | 3:00 PM |
| 5 | 10/11/2017 | 2:15 PM | 6:45 PM |
| 6 | 10/11/2017 | 3:00 PM | 6:45 PM |
+---+------------+------------+----------+
I would want to evaluate these columns and return the total duration of each "chain" on the final line of the series of overlaps. In my example below, we'll put that in column H. It finds 5.75 hours for the series that begins in row 2 and ends in row 6 (1 pm to 6:45 pm).
+---+------------+------------+----------+------------+
| | E | F | G | H |
+---+------------+------------+----------+------------+
| 1 | 10/11/2017 | 12:30 PM | 1:00 PM | 0.5 |
| 2 | 10/11/2017 | 1:00 PM | 3:00 PM | overlap |
| 3 | 10/11/2017 | 2:15 PM | 6:45 PM | overlap |
| 4 | 10/11/2017 | 2:30 PM | 3:00 PM | overlap |
| 5 | 10/11/2017 | 2:15 PM | 6:45 PM | overlap |
| 6 | 10/11/2017 | 3:00 PM | 6:45 PM | 5.75 |
+---+------------+------------+----------+------------+
I've tried writing queries, but keep finding myself back at the beginning. If anyone has a suggestion, I'd love to know it! Thank you in advance.
Neill
My Solution
To solve this I need 2 extra columns:
Step 1. Return "overlap" or "ok"
Two lines overlap when one end is inside the other:
I made a query formula to check this:
=if(QUERY(ArrayFormula({value(E1:E+F1:F),VALUE(E1:E+G1:G)}),
"select count(Col1) where
Col1 < "&value(G1+E1-1/10^4)&"
and Col2 > "&value(F1+E1+1/10^4)&" label Count(Col1) ''",0)>1,"overlap","ok")
Drag the formula down. The result is column:
ok
overlap
overlap
overlap
overlap
ok
ok
overlap
overlap
overlap
overlap
ok
In the formula:
value is used to compare numbers. Must compare each pare: date + time.
-1/10^4 and +1/10^4 is used because of imprecision in query
Step 2. Get Time Chains
This part is tricky. My solution will only work if data is sorted like in the example.
Enter 1 in cell I1. In cell I2 enter the formula:
=if(or(and(H1=H2,H2="overlap"),and(H2="ok",H1="overlap")),I1,I1+1)
Drag the formula down. The result is column:
1
2
2
2
2
2
3
4
4
4
4
4
Step3. Get Durations
In J4 paste and copy down the formula:
=if(H1="ok",
round(QUERY(ArrayFormula({value(E:E+F:F),VALUE(E:E+G:G),I:I}),
"select max(Col2) - min(Col1) where Col3 = "&I1
&" label max(Col2) - min(Col1) ''")*24,2),"")
The query gets max durations by groups, found in step2.
round is used because of imprecision in query

Measuring periodicity strength of a specific time on the time series data

I try to measure periodicity strength of a specific time on the time series data when a period (e.g., 1day, 7day) is given.
For example,
| AM 10:00 | 10:30 | 11:00 |
DAY 1 | A | A | B |
DAY 2 | A | B | B |
DAY 3 | A | B | B |
DAY 4 | A | A | B |
DAY 5 | A | A | B |
If a period is 1 day, AM 10:00 and 11:00 is the highest strength of periodicity in this data because there are consistent value in both times.
Are there any popular method or research to do this?
There are many existed research for finding periodic pattern in the time series, but I can't find research measuring periodicity strength of a specific time when a period is given.
Please sharing your knowledge. Thanks.
What you are looking for is something called cyclic association rules. I've linked to the paper that was originally written by researches at Bell Labs.

Stata: Convert date, quarter to year

I have a time series dataset with quarterly observations, which I want to collapse to an annual series. For that, I need to transform my date variable first.
It looks like
. list date in 1/5
+--------+
| date |
|--------|
1. | 1991q1 |
2. | 1991q2 |
3. | 1991q3 |
4. | 1991q4 |
5. | 1992q1 |
+--------+
Hence, to collapse, I want date (or date2) to be 1991, 1991, 1991, 1991, 1992 etc.
Once I have that, I could use collapse or tscollapse to turn my dataset into annual data.
// create some example data
. clear all
. set obs 5
obs was 0, now 5
. gen date = 123 + _n
. format date %tq
// create the yearly date
. gen date2 = yofd(dofq(date))
// admire the result
. list
+----------------+
| date date2 |
|----------------|
1. | 1991q1 1991 |
2. | 1991q2 1991 |
3. | 1991q3 1991 |
4. | 1991q4 1991 |
5. | 1992q1 1992 |
+----------------+
Another way is just to remember that years and quarters are just integers. A little consultation of the documentation and a little fiddling around yield
. gen Y = 1960 + floor(Q/4)
as a conversion rule to get years from Stata quarterly dates. Formatting year as a yearly date is then permissible but superfluous.

Resources