Stata: Convert date, quarter to year - time-series

I have a time series dataset with quarterly observations, which I want to collapse to an annual series. For that, I need to transform my date variable first.
It looks like
. list date in 1/5
+--------+
| date |
|--------|
1. | 1991q1 |
2. | 1991q2 |
3. | 1991q3 |
4. | 1991q4 |
5. | 1992q1 |
+--------+
Hence, to collapse, I want date (or date2) to be 1991, 1991, 1991, 1991, 1992 etc.
Once I have that, I could use collapse or tscollapse to turn my dataset into annual data.

// create some example data
. clear all
. set obs 5
obs was 0, now 5
. gen date = 123 + _n
. format date %tq
// create the yearly date
. gen date2 = yofd(dofq(date))
// admire the result
. list
+----------------+
| date date2 |
|----------------|
1. | 1991q1 1991 |
2. | 1991q2 1991 |
3. | 1991q3 1991 |
4. | 1991q4 1991 |
5. | 1992q1 1992 |
+----------------+

Another way is just to remember that years and quarters are just integers. A little consultation of the documentation and a little fiddling around yield
. gen Y = 1960 + floor(Q/4)
as a conversion rule to get years from Stata quarterly dates. Formatting year as a yearly date is then permissible but superfluous.

Related

Automatic calculation based on date and tickbox state

I am trying to build a spreadsheet to track and automatically calculate money when I am called out for work.
Here are the conditions:
Monday, Tuesday, Wednesday, Thursday - Standby Rate: £21
Friday, Saturday, Sunday, Bank Holidays - Standby Rate: £26
Monday, Tuesday, Wednesday, Thursday - Callout Rate: (Hours Worked * Hourly Rate) * 1.25
Friday, Saturday, Sunday, Bank Holidays - Callout Rate: (Hours Worked * Hourly Rate) * 1.5
I have a spreadsheet containing the following information:
Column A - Date | Date
Column B - Called Out | Checkbox, tick if yes
Column C - Duration | If called out, how long for
Column D - Calculation | Shows the calculation used to determine payment
Column E - Payment | Shows the payment
The sheet looks like this:
+------------+-------------+----------+---------------------+---------+
| Date | Called Out? | Duration | Calculation | Payment |
+------------+-------------+----------+---------------------+---------+
| 01/02/2021 | | | 21 | £21 |
| 02/02/2021 | | | 21 | £21 |
| 03/02/2021 | | | 21 | £21 |
| 04/02/2021 | | | 21 | £21 |
| 05/02/2021 | | | 26 | £26 |
| 06/02/2021 | TRUE | 2 | 26+((2*50)*1.5) | £176 |
| 07/02/2021 | TRUE | 1 | 26+((1*50)*1.5) | £101 |
| 15/02/2021 | | | 21 | £21 |
| 16/02/2021 | TRUE | 1.5 | 21+((1.5*50)*1.25) | £177.25 |
| 17/02/2021 | | | 21 | £21 |
| 18/02/2021 | | | 21 | £21 |
| 19/02/2021 | | | 26 | £26 |
| 20/02/2021 | | | 26 | £26 |
| 21/02/2021 | | | 26 | £26 |
+------------+-------------+----------+---------------------+---------+
I have had some success with the following formula to get the standby rates (K1 contains my actual hourly rate):
=SUM(IF(WEEKDAY(A2,2)>4,26,21),IF(WEEKDAY(A2,2)>4,(($K$1*C2)*1.5),(($K$1*C2)*1.25)))
But I need to make it account for Bank Holidays and perform a check to see if column B is TRUE, then if it calculates the payment as dictated above.
Any ideas?
Your constants are.
Standby Rate: £21 OR £26
Hourly Rate: £50
Always: Standby OR Called Out
Bank Holidays
Change your table and use this
=ArrayFormula(IF((WEEKDAY(A2:A22,2)>4)+(B2:B22=TRUE),26,21)+
IF((WEEKDAY(A2:A22,2)>4)+(B2:B22=TRUE),C2:C22*50*1.5,C2:C22*50*1.25))
My working spreadsheet is here
https://docs.google.com/spreadsheets/d/1N7d2-W7pRTqpO9L4DvSkmm4j7vaHpJZ2fMYAq-yVVPg/copy
I made it in a few stages and tried to make it as simple as possible.
First we determine day of the week based on date:
=WEEKDAY(A4,2)
I put it in Column C for illustration only
Then I make a table with rates for each day of the week (assuming that sunday is 1st day of the week) - you see this in columns J and K
Then I set daily rate based on day of the week and 2nd column of table:
=vlookup(weekday(A4,2),$J$1:$K$8,2,false)
I don't use arrayformula here, just copy down formula, so when it's national holiday or something, you can manually change rate.
Finally I calculate payment for each day.
I add standby rate to call out hours (if there are none it's just flat standby rate).
I multiply hours by 1,5 for days with 26 standby rate and by 1,25 for days with 21 standby rate:
=D4+B4*50*(if(D4=26,1.5,1.25))

How do I make a calculation field reference specific values from a pivot table in google sheets

So I'm making a punch in/out dashboard in google sheets. It uses a google form to populate a sheet with my employees punches like so:
Timestamp | Name | Punch Type | Time
6/2/2020 15:09:55 | Bob | 1. Start Shift | 7:30:00 AM
6/2/2020 15:10:45 | Bob | 2. Start Lunch | 11:00:00 AM
6/2/2020 15:11:08 | Bob | 3. End Lunch | 11:30:00 AM
6/2/2020 16:01:04 | Bob | 4. End Day | 4:00:00 PM
...
I then used this source data to make a pivot table that looks like this:
AVERAGE of Time | Punch Type
Name | 1. Start Shift | 2. Start Lunch | 3. End Lunch | 4. End Day
Bob | 7:30:00 AM | 11:00:00 AM | 11:30:00 AM | 4:00:00 PM
...
In this pivot table, I want to add a column at the end that is a calculated field of
("4. End Day" - "1. Start Shift") - ("3. End Lunch" - "2. Start Lunch").
I'm encountering two road blocks here. First is when I go to add a calculated field in the pivot table editor panel, it creates 4 new columns instead of just one:
| Punch Type | Values
| 1. Start Shift | 2. Start Lunch | 3. End Lunch | 4. End Day
Name | AVERAGE of Time.. | AVERAGE of Time.. | AVERAGE of Time.. | AVERAGE of Time..
Bob | 7:30:00 AM | 0 | 11:00:00 AM | 0 | 11:30:00 AM | 0 | 4:00:00 PM | 0
...
I the second issue is I can't figure out of to reference the columns with the timestamps to do this calculation.
Basically my end goal is a pivot table that looks like this:
AVERAGE of Time | Punch Type
Name | 1. Start Shift | 2. Start Lunch | 3. End Lunch | 4. End Day | Total Hours
Bob | 7:30:00 AM | 11:00:00 AM | 11:30:00 AM | 4:00:00 PM | 8.0
...
Displayed below is how I have my Pivot Table settings in the Pivot Table Editor Panel, before I attempt to add the calculated field

Find total duration of many overlapping times

I have a list of dates and times for employee time sheets. The times begin in column F, and end in column G. Sometimes there are overlapping times for projects. The employee does not get paid for overlapping projects, yet we need to track each project separately. I would like to be able to look at columns E, F and G and find any overlapping projects, and return a single time entry. In the example below, notice that line 1 does NOT overlap with the others, but that there is a series of overlapping entries in lines 2-6. They don't necessarily all overlap, but are more like a "chain." I want to write a formula (not a script) to solve this.
+---+------------+------------+----------+
| | E | F | G |
+---+------------+------------+----------+
| 1 | 10/11/2017 | 12:30 PM | 1:00 PM |
| 2 | 10/11/2017 | 1:00 PM | 3:00 PM |
| 3 | 10/11/2017 | 2:15 PM | 6:45 PM |
| 4 | 10/11/2017 | 2:30 PM | 3:00 PM |
| 5 | 10/11/2017 | 2:15 PM | 6:45 PM |
| 6 | 10/11/2017 | 3:00 PM | 6:45 PM |
+---+------------+------------+----------+
I would want to evaluate these columns and return the total duration of each "chain" on the final line of the series of overlaps. In my example below, we'll put that in column H. It finds 5.75 hours for the series that begins in row 2 and ends in row 6 (1 pm to 6:45 pm).
+---+------------+------------+----------+------------+
| | E | F | G | H |
+---+------------+------------+----------+------------+
| 1 | 10/11/2017 | 12:30 PM | 1:00 PM | 0.5 |
| 2 | 10/11/2017 | 1:00 PM | 3:00 PM | overlap |
| 3 | 10/11/2017 | 2:15 PM | 6:45 PM | overlap |
| 4 | 10/11/2017 | 2:30 PM | 3:00 PM | overlap |
| 5 | 10/11/2017 | 2:15 PM | 6:45 PM | overlap |
| 6 | 10/11/2017 | 3:00 PM | 6:45 PM | 5.75 |
+---+------------+------------+----------+------------+
I've tried writing queries, but keep finding myself back at the beginning. If anyone has a suggestion, I'd love to know it! Thank you in advance.
Neill
My Solution
To solve this I need 2 extra columns:
Step 1. Return "overlap" or "ok"
Two lines overlap when one end is inside the other:
I made a query formula to check this:
=if(QUERY(ArrayFormula({value(E1:E+F1:F),VALUE(E1:E+G1:G)}),
"select count(Col1) where
Col1 < "&value(G1+E1-1/10^4)&"
and Col2 > "&value(F1+E1+1/10^4)&" label Count(Col1) ''",0)>1,"overlap","ok")
Drag the formula down. The result is column:
ok
overlap
overlap
overlap
overlap
ok
ok
overlap
overlap
overlap
overlap
ok
In the formula:
value is used to compare numbers. Must compare each pare: date + time.
-1/10^4 and +1/10^4 is used because of imprecision in query
Step 2. Get Time Chains
This part is tricky. My solution will only work if data is sorted like in the example.
Enter 1 in cell I1. In cell I2 enter the formula:
=if(or(and(H1=H2,H2="overlap"),and(H2="ok",H1="overlap")),I1,I1+1)
Drag the formula down. The result is column:
1
2
2
2
2
2
3
4
4
4
4
4
Step3. Get Durations
In J4 paste and copy down the formula:
=if(H1="ok",
round(QUERY(ArrayFormula({value(E:E+F:F),VALUE(E:E+G:G),I:I}),
"select max(Col2) - min(Col1) where Col3 = "&I1
&" label max(Col2) - min(Col1) ''")*24,2),"")
The query gets max durations by groups, found in step2.
round is used because of imprecision in query

Sum values in a range based on the date from another column above the cells

Here are my data :
+-------+------------+------------+------------+------------+------------+------------+
| Date | 01/01/2017 | 02/01/2017 | 03/01/2017 | 01/02/2017 | 02/02/2017 | 03/02/2017 |
+-------+------------+------------+------------+------------+------------+------------+
| Value | 1 | 0,5 | 0 | 2 | 0,5 | 1 |
+-------+------------+------------+------------+------------+------------+------------+
I trying to write a formula that would calculate all values for each month. So with my example right here I would get 1,5 for January and 3,5 for February.
I tried something with =SUMIF(), =OFFSET() and =MONTH() so that it would only sum the values that share the same month based on the date above them, but I tried everything I always get a syntax error.
Does anybody have an idea ? Is it even possible without doing scripts ?
Thank you very much and have a good day.
OK so I found a way with =FILTER() :
=SUM(FILTER(2:2;MONTH(1:1)=MONTH(XXX)))
Where XXX here is the month I want to calculate. In my case I do it from another sheet :
+---+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+
| | A | B | C |
+---+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+
| 1 | Jan. 2017 | Feb. 2017 | Mar. 2017 |
| 2 | =SUM(FILTER('OtherSheet'!2:2;MONTH('OtherSheet'!1:1)=MONTH(A1))) | =SUM(FILTER('OtherSheet'!2:2;MONTH('OtherSheet'!1:1)=MONTH(B1))) | =SUM(FILTER('OtherSheet'!2:2;MONTH('OtherSheet'!1:1)=MONTH(C1))) |
+---+------------------------------------------------------------------+------------------------------------------------------------------+------------------------------------------------------------------+
Which gives me :
+-----------+-----------+-----------+
| Jan. 2017 | Feb. 2017 | Mar. 2017 |
+-----------+-----------+-----------+
| 1,5 | 3,5 | 2 |
+-----------+-----------+-----------+

Measuring periodicity strength of a specific time on the time series data

I try to measure periodicity strength of a specific time on the time series data when a period (e.g., 1day, 7day) is given.
For example,
| AM 10:00 | 10:30 | 11:00 |
DAY 1 | A | A | B |
DAY 2 | A | B | B |
DAY 3 | A | B | B |
DAY 4 | A | A | B |
DAY 5 | A | A | B |
If a period is 1 day, AM 10:00 and 11:00 is the highest strength of periodicity in this data because there are consistent value in both times.
Are there any popular method or research to do this?
There are many existed research for finding periodic pattern in the time series, but I can't find research measuring periodicity strength of a specific time when a period is given.
Please sharing your knowledge. Thanks.
What you are looking for is something called cyclic association rules. I've linked to the paper that was originally written by researches at Bell Labs.

Resources