Shuffling of time series data in pytorch-forecasting - time-series

I am using pytorch-forecasting for count time series. I have some date information such as hour of day, day of week, day of month etc...
when I assign these as categorical variables in TimeSeriesDataSet using time_varying_known_categoricals the training.data['categoricals'] values seem shuffled and not in the right order as the target. Why is that?
pandas dataframe is like below before going through TimeSeriesDataSet
After the following code
why has hour of day column changed to 0, 1, 12, 17?

Actually, the time_varying_known_categoricals are NOT shuffled. The categories assigned to them are not in order like 1 for 1st hour, 2 for 2nd hour etc.. that's why it feels like it has shuffled the time series. I tried to align "hour_of_day" categorical variable for 3 days. I noticed that the encoding for each hour matches correcly for each day so there is no shuffling. This information should be mentioned in the doc string atleast. It will save a lot of time and confusion.

Related

Predict a future date based on an average

I am trying to create a formula that will help me predict a future date based on an average time per day.
For example, I have a range of dates [1/12/2022, 5/12/2022, 15/12/2022], and each date has an amount of hours spent on that day [4, 2, 12]. At the moment I have a formula which will work out the average p/day by dividing the total by the start and current date.
What I want is to then predict the date based on this average hours (say 4 p/day) I will reach a goal of 2000.
An example sheet would look like this -
If below scenario is your input data then the following formula may help.
=C2+ROUNDUP(B2/A2,0)

Sum in ARRAYFORMULA based on possible cell values of two rows, then subtract another sum based on two different date values in two different columns

I really couldn't put the title into words very well. I will link a template spreadsheet below.
I've been working on a formula for hours now however I keep hitting dead ends. I'm unable to effectively do what I believe should be feasible. I'd give my attempts however I believe it would be of zero help, instead I'll explain my desired outcome.
I have a page with my employees, the E column isn't populated right now as I'd like to create a formula (ARRAYFORMULA so I don't have to paste a formula into each cell) to calculate the output based on a few conditions and values.
Vacation days are calculated as follows. The CEO gets 5, managers get 3 and assistants get 1. Extra vacation days based on points employees receive, 30 points or above is 5, 20 points or above is 3 and 10 or above point is 1.
Calculating the amount of vacation days employees have earned wasn't the hard part for me, it was having the formula subtract days based on how many vacation days have been used in the past 30 days.
We log vacations on the vacation page. The formula on the employees page needs to calculate how many vacation days each employee has used in the past 30 days only and subtract that from the total earned vacation days that employee has earned.
I'd like for the formula to use TODAY() to calculate 30 days in the past however for the sake of this example I'll use the date 06/09/2021 instead for continuity.
Sorry if I haven't explained this well or I'm asking too much in one go, I figured all the context is required.
Example sheet

How do I automatically append a value to an array at the end of each day?

I am working on a calorie tracking app where the user inputs values to keep a running calorie total throughout the day. I would like to automatically append this daily value to an array at the end of each day so I can present running averages for the last seven days, fourteen days, and thirty days from the array data.
This seems like a straightforward enough issue, but I've been having trouble finding an answer or relevant example on here or googling in general. Thanks in advance for any assistance or relevant links.
Don't append the value at the end of the day, append it the first time someone performs an action on the next day. You can use Date() to work out what day it is. If the day has changed since the last input then append the previous totals to the array.

Filling missing values with distribution

So i have 2 dataset.
On the first one i have values for each hour of a day. Example:
Date Value
05/07/2017 01:00 5
05/07/2017 02:00 10
05/07/2017 03:00 5
In the second dataset i only have the total of each day
Date Value
05/07/2017 40
So i want to distribute the total of the second dataset by the same distribution of the first dataset. Something like this:
Date Value
05/07/2017 01:00 10
05/07/2017 02:00 20
05/07/2017 03:00 10
How can i do this? I'm using R and created a time series for the first dataset.
You may want to check the mice package for R which specialises in missing data imputation. In your case probably a knn method which would impute the missing values by regarding similar (times) attribute-wise samples might do the trick.
Having a second look, maybe a bit more sophisticated procedure would be possible to bootstrap the values across the different times and then to fill the missing value you would have to find a random (times) combination (assuming that you use a random sample of each time specific time pool or distribution) of these which would total to the sum that you have.

Get difference of time and output to decimal format

I am trying to get the different between two times, lets say 2:00PM and 12:00AM. So I want to get how many hours are between those two times but have it be in decimal format which in this case would be 10.00 hours?. I am not sure how to go about this. The most I got to was just subtracting the two times and multiplying that decimal number by 24 which works if I do 2PM and 11PM which gives me 9.00hours, but as soon as I go to 2PM and 12AM it should show 10.00hours but shows -14.
Assuming your times are in ColumnA ("earlier") and ColumnB ("later") then:
=if(B1=0,(B1-A1+1)*24,(B1-A1)*24)
should work for you. The quotes are because (seems to depend upon how the times values are entered) Google may associate a date with the times even when that is not displayed. Google treats noon as 12:00PM which is wrong, it is noon not after noon (post meridiem) but one minute later 12:01PM, etc, does make sense. So 12:00AM is midnight and a special case where a date is associated because seen as midnight of the previous day - it counts as 0 not 24. Hence relative to 2pm today is 14 hours earlier (your result) whereas midnight tonight is 10 ahead (the result you expected).
The formula above checks whether the later time is midnight and compensates for that being treated as the day before by +1 in the formula.

Resources