Time of the day dimension - data-warehouse

How can I build a time-of-the-day dimension table which holds granular information for hours, minutes, seconds and such information

got it from here - http://wiki.postgresql.org/wiki/Date_and_Time_dimensions

I'm curious why you would want a TIME dimension when this information can easily be extracted from the TIME portion of a TIMESTAMP or TIME data type itself. Having said that you could create a table DIM_TIME (Time, Hour, Minute, Seconds). You would then have to populate it with 86400 values containing a record for each second within a 24 hour period.

Related

What is the time unit for "datadelay" attribute on GoogleFinance?

I'm using the GoogleFinance() functions on a Google spreadsheet to keep track of my stocks. With the "datadelay" attribute I can check how long ago the data has been updated for the last time. But it only returns a raw number, like "54000" for one ticker and "15" for another. What time unit is that supposed to be? minutes? seconds? milliseconds?
When I check the documentation for the Google Finance I saw that there is a page explains there might be delay up to 20 minutes. They also mentioned that they are using different exchanges to retrieve market data and all this different exchanges might have different data delay. It can explain the differences in the "datadelay" column.
For the unit of this column, my assumption is it should be shorter than seconds since 54000 seconds = 900 min, which is far higher than the maximum delay defined in the help page. But I am not sure what would be value for this column when you query in the not-trading days.
The page shows delays for each exchange.
GoogleFinance() function updates every minute (if it is set so), but keep in mind that results may be delayed up to 20 minutes. so the answer is between 1-20 minutes

How to handle time-intervals with timezone in iOS/Swift

I am new to this iOS world, trying to learn how to handle dates and time.
Imagine I have a Class Shop. The shop have time-intervals which represent the open and close time for each day of the week.
Some context data (example string from database, GMT Timezone):
Monday: "08:00:00-13:00:00, 15:00:00-18:00:00"
Tuesday:"09:00:00-13:00:00, 15:00:00-19:00:00"
Wednesday: "15:00:00-23:59:59"
Thursday: "00:00:00-08:00:00"
etc..
Monday for example would have to store 2 time-intervals.
My question is how can I store this data (array of DateIntervals? TimeIntervals? or another more suitable class?) in a Class and get the current time to check if the store is opened or not.
The native date format for iOS (and Mac OS) is the Date object. A Date object represents and instant in time, independent of time zone. You then use a DateFormatter to convert a date to a string representation in a particular time zone.
In your case, though, you need to represent timer ranges for days of the week on a variety of different dates.
You should read the Calendar class reference in the Xcode documentation. Of particular interest would be the date(bySetting:value:of:) method, which will let you start from a given date and calculate a new date by changing the value of various date components.
You have a set of time intervals for each day. So you need a way to store, for a given day of the week, one or more time intervals. Your time intervals have a start time and an end time. Each of those needs to be represented by an hour, minute, and optionally second.
With that information you can get the current date/time and split it into components. Get the weekday, hour, minute, and second. Using the weekday you can get the appropriate time intervals. Then you can iterate those intervals and see if the current hour, minute, second falls between one of the intervals.
This all assumes that for a given business, your time intervals (open times) are specified in local time for the given business.
When converting the current date/time into its components, you should ensure that you set the calendar's timezone to match the timezone of the business in question.
There is no need for any date comparisons for any of this. You want to compare hours/minutes/seconds of the current date with the hours/minutes/seconds of the open times.

GSheets: Calculating difference in times in a 24 hour business

I'm trying to calculate the difference between a start time and a persons first action and the same for the end time and their last action.
My problem is due to having a 24 hour business I can't seem to figure out a single formula to cope with shift times and actions being anywhere in the 24 hour time frame.
Example sheet
The start times are manually imputed as "06:00" format and the Actions are taken from a Left("12/09/2017 19:08:25"),8 format.
The data entry is bad. You need to enter start and end times with its date ( You could just format it as just time for presentation purposes). But dates have to entered in and should be there below the time. For example, One of the finish times is 00:00:00 , while the expected finish time is 23:00:00. We could assume that he worked one hour late and ended his shift by start of tomorrow. But spreadsheets cannot assume. It would think that he finished by today morning 00:00:00 instead of his actual finish time tomorrow morning 00:00:00. So he went off early by 23 hours. Once that's fixed, You could simply find the difference between the two times. But, You have to format that column as number>Duration instead of Time.
To sum up,
Add dates to start and end times
Format the resulting difference column as Number>Duration.

Prepping Data For Usage Clustering

Dataset: I'm given the number of minutes individual customers use a product each day and am trying to cluster this data in order to find common usage patterns.
My question: How can I format the data so that, for example, a power user with high levels of use for a year looks the same as a different power user who has only been able to use the device for a month before I ended data collection?
So far I've turned each customer into an array where each cell is the number of minutes used that day. This array starts when the user first uses the product and ends after the user's first year of use. All entries in the cells must be double values (e.x. 200.0 minutes used) for the clustering model. I've considered either setting all cells/days after the last day of data collection to either -1.0 or NULL. Are either of these a valid approach? If not what would you suggest?
For the problem where you want both users (one that used the product a lot every day for a year, and the other used it a lot for one month), create a new entry where it's values are:
avg_usage per time_bin
time_bin can be a month, a day or another time bin which best fits your needs.
This way, a user which use a product, let's say 200 minutes per day for one year, will get:
200 * 30 * 12 / 12 = 6000 minutes per month
and the other user, which joined just last month, will also get, with the exact same usage will get:
200 * 30 * 1 / 1 = 6000 minutes per month.
This way, it doesn't matter when you have started to use the product, the only thing that matter, is the usage rate.
An important thing you might take into consideration, that products, may be forgotten for some time. for example, a computer, and I'm away for a vacation. Those days I didn't use my computer, doesn't have (maybe) an effect of my general usage of this product. So, based on your data, product and intuition you might consider removing gaps like the one I mentioned, and not take it into account inside the calculation.
The amount of time a user has used your product could be a signal of something, but if indeed he only started some time ago, and still using it until today, it may be something you need to take into consideration, and for that use, this average binning technique may help.

Time and date dimension in data warehouse

I'm building a data warehouse. Each fact has it's timestamp. I need to create reports by day, month, quarter but by hours too. Looking at the examples I see that dates tend to be saved in dimension tables.
(source: etl-tools.info)
But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficient than using date/time functions in SQL.
What are your opinions/solutions ?
(I'm using Infobright)
Kimball recommends having separate time- and date dimensions:
design-tip-51-latest-thinking-on-time-dimension-tables
In previous Toolkit books, we have
recommended building such a dimension
with the minutes or seconds component
of time as an offset from midnight of
each day, but we have come to realize
that the resulting end user
applications became too difficult,
especially when trying to compute time
spans. Also, unlike the calendar day
dimension, there are very few
descriptive attributes for the
specific minute or second within a
day. If the enterprise has well
defined attributes for time slices
within a day, such as shift names, or
advertising time slots, an additional
time-of-day dimension can be added to
the design where this dimension is
defined as the number of minutes (or
even seconds) past midnight. Thus this
time-ofday dimension would either have
1440 records if the grain were minutes
or 86,400 records if the grain were
seconds.
My guess is that it depends on your reporting requirement.
If you need need something like
WHERE "Hour" = 10
meaning every day between 10:00:00 and 10:59:59, then I would use the time dimension, because it is faster than
WHERE date_part('hour', TimeStamp) = 10
because the date_part() function will be evaluated for every row.
You should still keep the TimeStamp in the fact table in order to aggregate over boundaries of days, like in:
WHERE TimeStamp between '2010-03-22 23:30' and '2010-03-23 11:15'
which gets awkward when using dimension fields.
Usually, time dimension has a minute resolution, so 1440 rows.
Time should be a dimension on data warehouses, since you will frequently want to aggregate about it. You could use the snowflake-Schema to reduce the overhead. In general, as I pointed out in my comment, hours seem like an unusually high resolution. If you insist on them, making the hour of the day a separate dimension might help, but I cannot tell you if this is good design.
I would recommend having seperate dimension for date and time. Date Dimension would have 1 record for each date as part of identified valid range of dates. For example: 01/01/1980 to 12/31/2025.
And a seperate dimension for time having 86400 records with each second having a record identified by the time key.
In the fact records, where u need date and time both, add both keys having references to these conformed dimensions.

Resources