A Time Series with a strange behaviour - machine-learning

I hope everyone is doing well.
I am working on a time series project to predict hourly the waiting time (idle time) of a zone.
The idle time of a zone at a given hour is the average idle time of vehicles that start to wait at the given hour in that zone, and the idle time of a vehicle is the amount of time a vehicle should wait in that zone to be booked. For example, if we predict at 16h00 for zone A, a value of 90 minutes, it means a vehicle that starts to wait in this zone between 16h00 and 17h00 will wait 90 minutes to be booked.
For our idle time (our ground truth), at a given hour B, we have to wait 2 days (48 hours) to establish the complete ground truth value for hour B since we have to wait a maximum of two days for vehicles that start to wait at B and are not booked yet. So each time we want to make a prediction, the last 48 points are unstable. For example, if we want to make a prediction at time n, the ground truth of n-1 is partial and incomplete, and we have to wait 48–1 = 47 hours to establish the final value of the waiting time at n-1.
We can resume that problem as the recent past data at prediction time is changing and not fixed.
The following image illustrates what I explained above.enter image description here
My questions are :
Is this kind of behaviour known in the time series field? If that's the case, does it have a specific name?
2-How to mix stable and unstable points in order to make accurate predictions?
Any suggestions? and thank you ahead of time:)

Related

How can I calculate rate (events per minute) in Google Sheets?

I am working on creating a spreadsheet template for a video observation tool that my organization will use. Specifically, we will watch ~20-minute long videos, and record the rate (occurrences per minute) of certain behaviors within subsections of the video. For example, "in the clip from 2:06 to 4:30, the speaker asked the audience an average of 2.5 questions per minute."
I think it would be easiest for users to denote individual clips by providing start and end times (e.g. Start: 22:40 End: 23:02). Users should be able to input a count of certain occurrences, and then the spreadsheet will divide that number by the time elapsed and calculate a rate per minute. That is to say, if the speaker asked 8 questions between the timestamps 22:40 and 24:20, the spreadsheet should return a value of 8/(1.67 minutes) = 4.8 questions per minute.
I'm having trouble figuring out a way to enter time values in Google Sheets without it treating them as actual times in a 24-hour day. For example, 22:40 shouldn't refer to 00:22:40am nor to 10:40pm; I just mean 22 minutes and 40 seconds. I guess in theory, I would need it to treat the End Time as x-many minutes (or fractions of a minute) after a given Start Time, so it would need to calculate the total number of seconds elapsed between two mm:ss values and divide that sum by 60 to get the time elapsed in minutes. Then, I could simply divide the count of occurrences (e.g. 8 questions) by that number (1.67 minutes), and get my answer.
Does anyone have any tips about how this could be done? Thank you so much for your help!!
Current State:
Start Time: 22:40
End Time: 24:20
Questions Asked: 8
When I enter =8/(End Time - Start Time), I get 0:00 for some reason. I want it to return 4.8.
Format those durations as Format > Number > Duration. Enter durations complete with elapsed hours, minutes and seconds, as in 0:22:40 and 0:24:20.
You can then calculate events per minute like this:
=E2 / 24 / 60 / (T2 - S2)
...where E2 is the total number of events, S2 is the start moment, and T2 is the end moment.
Format the formula cell as Format > Number > Number.
See this answer for an explanation of how date and time values work in spreadsheets.

How to count the number of metrics sent to Datadog over a 24 hour period?

I have a situation where I'm trying to count the number of files loaded into the system I am monitoring. I'm sending a "load time" metric to Datadog each time a file is loaded, and I need to send an alert whenever an expected file does not appear. To do this, I was going to count up the number of "load time" metrics sent to Datadog in a 24 hour period, then use anomaly detection to see whether it was less than the normal number expected. However, I'm having some trouble finding a way to consistently pull out this count for use in the alert.
I can't use the count_nonzero function, as some of my files are empty and have a load time of 0. I do know about .as_count() and count:metric{tags}, but I haven't found a way to include an evaluation interval with either of these. I've tried using .rollup(count, time) to count up the metrics sent, but this call seems to return variable results based on the rollup interval. For instance, if I compare intervals of 2000 and 4000 seconds, I would expect each 4000 second interval to count up about the sum of two 2000 second intervals over the same time period. This does not seem to be what happens at all - the counts for the smaller intervals seem to add up to much more than the count for the larger one. Additionally some rollup intervals display decimal numbers as counts, which does not make any sense to me if this function is doing what I thought it was.
Does anyone have any thoughts on how to accomplish this? I'd really appreciate any new ideas.

What is the meaning of OneMinuteRate in JMX?

I am trying to calculate the Read/second and Write/Second in my Cassandra 2.1 cluster. After searching and reading, I came to know about JMX bean
org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency
Here I can see oneMinuteRate. I have started a brand new cluster and started collected these metrics from 0.
When I started my first record, I can see
Count = 1
OneMinuteRate = 0.01599111...
Does it mean that my write/s is 0.0159911?
Or does it mean that based on 1 minute data, my write latency is 0.01599 where Write Latency refers to the response time for writing a record?
Please help me understand the value.
Thanks.
It means that in the last minute, your writes per second were occuring at a rate of .01599 writes per second. Think about it this way: the rate of writes in the last 60 seconds would be
WritesInLastMinute ÷ 60
So in your case
1 ÷ 60 = 0.0166
Or more precisely, .01599.
If you observed no further writes after that, the value would descend down to zero over the next minute.
OneMinuteRate, FiveMinuteRate, and FifteenMinuteRate are exponential moving averages, meaning they are not simply dividing readings against time, instead, as the name implies they take an exponential series of averages as below:
result(t) = (1 - w) * result(t - 1) + (w) * event_this_period
where w is the weighting factor, t is the ticking time, in other words, simply they take 20% or the new reading and 80% of old readings, it's the same way UNIX systems measure CPU loads.
however, if this applies to requests that the server receives, below is a chart from one request to a server, measures taken by dropwizard.
as you can see, from one request a curve is drawn by time, it's really useful to determine trends, but not sure if they are great to monitor live traffic and especially critical one.

HKUnit for Sleep Analysis?

I'm need to query HealthKit for HKCategoryTypeIdentifierSleepAnalysis data, but can't find the compatible HKUnit for quantity value. Apple documentation is silent on units for Sleep Analysis. Am hoping someone already knows the answer.
BTW, the iOS Health app shows Hrs & Minutes on the Sleep chart, but the HKUnit reference doesn't include options for such composite units.
In Apples documentation I found this:
By comparing the start and end times of these samples, apps can calculate a number of secondary statistics: the amount of time it took for the user to fall asleep, the percentage of time in bed that the user actually spent sleeping, the number of times the user woke while in bed, and the total amount of time spent both in bed and asleep.
This means that you have to use the startDate and endDate property of your sample to calculate sleep durations.
Sleep samples are instances of HKCategorySample, which is unit-less. You should perform calculations for sleep samples using the startDate and endDate properties on the sample.

Do UNIX timestamps change across timezones?

As the subject asks; do UNIX timestamps change in each timezone?
For example, if I sent a request to another email the other side of the world saying, "Send out an email when the time is 1397484936", would the other server's timestamp be 12 hours behind my own?
The definition of UNIX timestamp is time zone independent. The UNIX timestamp is the number of seconds (or milliseconds) elapsed since an absolute point in time, midnight of Jan 1 1970 in UTC time. (UTC is Greenwich Mean Time without Daylight Savings time adjustments.)
Regardless of your time zone, the UNIX timestamp represents a moment that is the same everywhere. Of course you can convert back and forth to a local time zone representation (time 1397484936 is such-and-such local time in New York, or some other local time in Djakarta) if you want.
The article at http://en.wikipedia.org/wiki/Unix_time is pretty impressive if you'd like a longer read.
Unix time is defined as the number of seconds that have elapsed since 00:00:00 Coordinated Universal Time (UTC), Thursday, 1 January 1970. So the answer is no
Unix timestamps do not change accross timezones, they are created for the purpose of having a standard time across globe.
NOTE:-
Timestamps are calculated on the basis of current time in the computer thus do not rely on them until and unless you are very sure about the time settings in the participating machines.
Someone stated that "UTC is Greenwich Mean Time without Daylight Savings time adjustments." This is simply untrue. GMT does not have Dayllight Savings Time. GMT is measured in Greenwich, England (at the Naval Obeservatory) [0 longitude, but not 0 lattitude]. UTC is measured at the equator [0 longitude and 0 lattitude - which happens to lie in the ocean off the cost of Africa].
What difference does it make? It doesn't make a difference in terms of "what time of day is it?" It does, however, make a difference in terms of calculating a year. Now you'd think a year would be measured based upon the location of the center (the core) of the earth, right? When the earth's core is back in the same location it was ~365 days ago, it has been a year. It isn't measured that way. It is measured by a specific location on the earth getting back to the same location (relative to the sun) that it was ~365 days ago. But the period of a day and a year don't divide evenly. Once the earth is back to about where it was a year ago, the earth isn't facing the same direction it was last year, so that spot on the earth isn't facing the same direction it was a year ago. Being further north, Greenwich isn't going to get back to the same spot (relative to the sun) that it was last year at the same time that 0 Lat / 0 Long is. So if you base the definition on Greenwith vs. 0/0, you get a, albeit slightly, different answer to the question "how many days are in a year". To put it another way, when a given spot on the earth gets back to where it was a year ago (relative to the Sun), the core of the earth isn't in the same spot it was a year ago, so what spot you pick matters because the core of the earth is going to be in a different spot (relative to the sun) than it was one year ago, if you pick a different spot on the earth.
Neither UTC nor GMT have daylight savings time. Europe/London time, the timezone that Greenwich resides in, does. But GMT does not. GMT is, what Americans would call a "Standard Time" - i.e. without DST.
Getting back to the question, Epoch time doesn't technically have a timezone. It is based on a particular point in time, which just so happens to line up to an "even" UTC time (at the exact beginning of a year and a decade, etc.). If that concept doesn't fit well in your brain, and if it helps to think of Epoch time as being in UTC, go right ahead. You're in good company and in the grand scheme of things, it really doesn't matter. You ever see those law suits where somoene is awarded $1. It's kind of a "you're right, but it doesn't really matter" type of verdict. If someone sued you for saying Epoch time is in the UTC timezone, they would win $1. That wouldn't buy them a cup of coffee at any Starbucks in any timezone on the planet.
IF both computers are set up correctly with their clocks set for the correct timezone and UTC values, they should return the same value.
Of course that's a big IF. There's almost certain to be a difference of at least a second, more often minutes between the time reported by two computers. And many computers are set up to have incorrect timezone settings, and will report their local time when asked a timestamp rather than UTC.
And in that lies the difference between theory and practice. In theory it's all the same, in practice you should not rely on it.
No, epoch timestamp should not change, because it has a fixed timezone which is UTC.
If you want to use a time object in other time zone, just look it up in libraries of the language you use, but do NOT try to add/substract a couple of hours from epoch timestamp and assume it's in another time zone, which will make things very confusing to other people, especially when you expose it in your API.
If you use C++, I recommend this library. I heard it will soon be added into standard library.
For all, I understand sometimes time object is hard to deal with and it looks easier to add/substruct on epoch timestamp. Please don't do it and do not persuade others to do it. A time object is much easier once you get used to it and can take care of time zone conversion easily without messing up with historical time zone changes due to politics/law etc...

Resources