What is the format of the time field in this cypher? - neo4j

Heading ##CALL ga.timetree.single({time: 1463659567468, create: true})
https://github.com/graphaware/neo4j-timetree
https://graphaware.com/neo4j/2014/08/20/graphaware-neo4j-timetree.html
The above link says that time is in long format YYYYMMDDHHmmss. But the time parameter doesn't make any sense and random nodes are getting generated in neo4j. enter image description here
What does the time parameter hold and what is the meaning of it?

The time parameter is a millisecond timestamp, or milliseconds elapsed since the UNIX epoch, which is an extremely common means of storing time-related data, you can find this in use in nearly every digital system.
The timestamp cited here represents "2016-05-19 12:06:07". The timetree built starts from a root (this is a modeling convenience), and then its child is the year (2016) followed by the month (5), then the date of the month (19). Looks like it didn't automatically create any nodes for time resolutions beyond that.
Keep in mind that now that Neo4j has native temporal values that you can use in Cypher and store as properties (as well as index), time trees are going to be less useful, as you can always do index lookups on indexed temporal properties.
There are still some cases where time trees can still be very useful, however, such as when you're searching for events that happened within some unit of time that disregards its parent units...such as finding events that happened on Mondays regardless of month, or on Januaries regardless of year, and so forth.

Related

Storing Date Components Instead of a Date

My app lets people log the movies they see (for example). Each logged movie usually (but not always) has a date and sometimes has a time. It's not unusual to have one but not the other. Occasionally the dates are only a year ("I watched a Dumbo sometime in 1984"), but could realistically be any combination of day/month/year/time.
I am used to modeling dates as date objects in my app and my backend. But is it a viable approach to store each component separately? When I need to reference an actual date from the components (e.g. for sorting the log) this will be built client-side, or perhaps be stored as a derived property sortDate and updated whenever any of the components change.
My reservation is that the information the user is saving is truly a 'moment in time' and I will have to take care of some things myself - for example what time zone are my components stored relative to? This would be captured automatically as part of a real Date object.
The alternative seems to be assuming some sort of 'default' for missing components (e.g. year 0000 if no year, time 00:00 if no time). But those defaults have meaning and I won't be able to distinguish them from 'not provided'.
What are the limitations and/or pitfalls of this approach? Does anyone have experience modeling their dates this way?
If it's of any consequence, my app is for iOS written in Swift and uses a Parse Server backend.
I've successfully used question marks to represent ambiguous and unknown timestamp parts in legal systems. Try to keep in mind that you're really not modeling dates here ('1984' isn't a date); you're modeling facts about dates.
So, if one of your users saw a movie some time in 1984, you might record the value '1984-??-?? ??:??:??' in a text column in a database. Values like this sort sensibly.
See also this answer on dba. Comments on that answer are also good to read.

Querying by timezone vs location

Moment-timezone time has method momnet.tz that takes two params:
particular datetime
timezone name
It returns time shift (to UTC) that was at given timezone in given datetime. Ok.
The question that bothers me:
do all the locations that currently belong to a particular timezone were in the past belonging to that zone also?
so isn't it possible that if two locations even currently belonging to a certain timezone, in the past (even after 1970) had actually different timezones (shifts to UTC).
Is it possible in principle to query tz-db for specific some kind of location, not a timezone name.
Would be grateful if someone could eliminate my doubts.
Moment-timezone uses the data from the IANA time zone database (aka the TZDB, zoneinfo, or the Olson database). Most of your questions are addressed by that data, rather than by moment-timezone itself. You'll find that other implementations (for other languages, platforms, etc.) have similar behaviors.
There is a great deal of information about how the tzdb works in the theory file in the tzdb itself, and on Wikipedia, but I'll see if I can address your specific questions:
do all the locations that currently belong to a particular timezone were in the past belonging to that zone also?
The TZDB assigns time zones based on cities (because they are less likely to changes over time than other regional boundaries). Generally, one city within a given region whose clocks have been aligned since 1970 will be chosen to represent the time in that region.
When another part of that region changes their clocks differently than the rest of the region, a new time zone is created and a new city is chosen within that region to represent the zone. We call this a "zone split". Time before the split in both zones will match (except the LMT entry), and time at the split and forward will deviate. It doesn't matter if at some point in the future the time in these regions aligns again. There are now two zones and there will continue to be - because they deviated at some point in the past.
so isn't it possible that if two locations even currently belonging to a certain timezone, in the past (even after 1970) had actually different timezones (shifts to UTC).
If there is a distinct history of timekeeping in the region, then there will be two different time zone entries. So when you say "locations", if you mean two different cities with their own time zone names in the TZDB, then by definition they don't belong to the same time zone. For example, Europe/Moscow and Europe/Volgograd are both currently in UTC+3 year-round without DST. However at the start of 1992, Moscow was UTC+3 while Volgograd was UTC+4. Their histories before then deviate even further.
On the other hand, if you are talking about a location that is not specifically referenced in the TZDB, then there is a presumption of alignment. For example, Seattle is in the US Pacific time zone, all of which is represented by America/Los_Angeles. Because there is not a unique America/Seattle, the data is representing that Seattle does not have a unique time zone history than Los Angeles.
That said - there have been a few very minor edge cases that have come up in the past where a small town that is on the boundary line between two time zones has to chose between which zone to observe. It has also happened that a small town distinctly on one side of the boundary has chosen to unofficially follow the time zone in a neighboring larger city on the other side of a boundary. These changes are sometimes mentioned on the tzdb discussion list, but are rarely recorded in the data as a distinct zone.
With these edge cases, keep in mind that the TZDB only tracks cities - not regional boundaries that may divide cities or towns. For that, you'd have to go different data source. The best one I know of is Evan Siroky's timezone-boundary-builder project.
Is it possible in principle to query tz-db for specific some kind of location, not a timezone name.
You'll have to be more specific about what you mean by "location". If you mean a latitude/longitude coordinates - then the timezone-boundary-builder data, and the projects that use them, are the route to go. They will help you resolve a tzdb identifier, which you can then use with moment-timezone or other libraries.

What is the opposite of an AoE expiry?

I'm speccing an application that displays time periods to the user. The goal is to present periods in a simple view (no time, no timezones) and detailed view (date and time, with timezone data). The simple view should be unambiguous, in other words the user can glance at it and their assumptions about what they see are correct (they are valid in the local timezone).
For the end of the global period, displaying the date in the AoE timezone [1] will solve this problem. For example, a submission deadline might display as 2018-04-03 (actually 2018-04-03 23:59:59 AoE). This means submissions are accepted as long as it is April 3 somewhere on the planet.
But I also want to indicate that start of a global period. For example, if submissions open on April 2 2018 00:01, they are accepted as soon as it is April 2 somewhere on the planet. (This would currently be at UTC+14, matching the Line Islands.)
I can't see a way to use AoE to derive a global start time. Is there an equivalent to AoE (a standardized semantic timezone) that tracks the global start time?
Notes:
Hardcoding UTC-12 and UTC+14 is the simple answer for the modern day. But I'm looking for semantic timezones that would be updated if the values changed (and not reference non-existent historical datetimes).
I thought I'd seen Etc/AoE in the tz database but this is not the case.
References:
AoE
UTC-12:00
UTC+14:00
[1] The Anywhere on Earth (AoE) timezone represents the moment a datetime expires "anywhere on Earth". It currently matches time at Howland Island (UTC-12). If a UTC-13 timezone were invented, it would be updated to track that.
As far as I could understand, AoE is not a timezone as defined by IANA (AFAIK, a list of all offsets from some geographic region during history).
It's more like a "concept", an idea of a specific date being valid in any place on earth. As you said, this notion of "being valid" will change if more timezones are created or removed.
I don't even know if date/time API's can properly handle AoE automatically - maybe I should study more. But my conclusion is that the only way to achieve your goal is to check manually:
you could check all available timezones and see if the date is valid there, comparing to the current date/time at that zone
you could configure the UTC+14 as the offset to be compared, and make some scheduled job (daily/weekly/every-time-IANA-publishes-a-new-version?) to check all zones and set the correct one (with the biggest offset?). You must also take care if this zone has Daylight Saving changes, because the offset will change as well (and what to do with overlaps, when clocks shift 1 hour back and a local time may exist twice?)

Time and date dimension in data warehouse

I'm building a data warehouse. Each fact has it's timestamp. I need to create reports by day, month, quarter but by hours too. Looking at the examples I see that dates tend to be saved in dimension tables.
(source: etl-tools.info)
But I think, that it makes no sense for time. The dimension table would grow and grow. On the other hand JOIN with date dimension table is more efficient than using date/time functions in SQL.
What are your opinions/solutions ?
(I'm using Infobright)
Kimball recommends having separate time- and date dimensions:
design-tip-51-latest-thinking-on-time-dimension-tables
In previous Toolkit books, we have
recommended building such a dimension
with the minutes or seconds component
of time as an offset from midnight of
each day, but we have come to realize
that the resulting end user
applications became too difficult,
especially when trying to compute time
spans. Also, unlike the calendar day
dimension, there are very few
descriptive attributes for the
specific minute or second within a
day. If the enterprise has well
defined attributes for time slices
within a day, such as shift names, or
advertising time slots, an additional
time-of-day dimension can be added to
the design where this dimension is
defined as the number of minutes (or
even seconds) past midnight. Thus this
time-ofday dimension would either have
1440 records if the grain were minutes
or 86,400 records if the grain were
seconds.
My guess is that it depends on your reporting requirement.
If you need need something like
WHERE "Hour" = 10
meaning every day between 10:00:00 and 10:59:59, then I would use the time dimension, because it is faster than
WHERE date_part('hour', TimeStamp) = 10
because the date_part() function will be evaluated for every row.
You should still keep the TimeStamp in the fact table in order to aggregate over boundaries of days, like in:
WHERE TimeStamp between '2010-03-22 23:30' and '2010-03-23 11:15'
which gets awkward when using dimension fields.
Usually, time dimension has a minute resolution, so 1440 rows.
Time should be a dimension on data warehouses, since you will frequently want to aggregate about it. You could use the snowflake-Schema to reduce the overhead. In general, as I pointed out in my comment, hours seem like an unusually high resolution. If you insist on them, making the hour of the day a separate dimension might help, but I cannot tell you if this is good design.
I would recommend having seperate dimension for date and time. Date Dimension would have 1 record for each date as part of identified valid range of dates. For example: 01/01/1980 to 12/31/2025.
And a seperate dimension for time having 86400 records with each second having a record identified by the time key.
In the fact records, where u need date and time both, add both keys having references to these conformed dimensions.

How would you build this daily class schedule?

What I want to do is very simple but I'm trying to find the best or most elegant way to do this. The Rails application I'm building now will have a schedule of daily classes. For each class the fields relevant to this question are:
Day of the week
Starting time
Ending time
A single entry could be something such as:
day of week: Wednesday
starting time: 10:00 am
ending time: Noon
Also I must mention that it's a bi-lingual Rails 2.2 app and I'm using the native i18n Rails feature. I actually have several questions.
Regarding the day of the week, should I create an extra table with list of days, or is there a built-in way to create that list on the fly? Keep in mind these days of the week will have to be rendered in English or Spanish in the schedule view depending on the locale variable.
While querying the schedule I will need to group and order the results by weekday, from Monday to Sunday, and of course order the classes within each day by starting time.
Regarding the starting time and ending time of each class would you use datetime fields or integer fields? If the latter how would you implement this exactly?
Looking forward to read the different suggestions you guys will come up with.
I would just store the day of the week as an integer. 0 => Monday ... 6 => Sunday (or any way you want. ie. 0 => Sunday). Then store the start time and end time as Time.
That would make grouping really easy. All you would have to do is sort by the day of the week and the start time.
You can display this in multiple ways, but here is what I would do.
Have functions like: #sunday_classes = DailyClass.find_sunday_classes that returns all the classes for Sunday sorted by start time. Then repeat for each day.
def find_sunday_classes
find_by_day_of_week(1, :order -> 'start_time')
end
Note: find_by probably should have id at the end but that's just preference in how you want to name the column.
If you want the full week then call all seven from the controller and loop trough them in the view. You could even create detail pages for each day.
Translation is the only tricky part. You can create a helper function that takes an integer and returns the text for the appropriate day of the week based on local.
That's very basic. Nothing complicated.
If your data is a Time then I would store that as a Time - otherwise you will always have to convert it out of the database when you do date and time related operations on it. The day is redundant data, as it will be part of the time object.
This should mean that you don't need to store a list of days.
If t is a time then
t.strftime('%A')
will always give you the day as a string in English. This could then be translated by i18n as required.
So you only need to store starting time and ending time, or starting time and duration. Both should be equivalent. I would be tempted to store ending time myself, in case you need to do data manipulations on ending times, which therefore won't have to be calculated.
I think most of the rest of what you describe should also fall out of storing time data as instances of Time.
Ordering by week day and time will just be a matter of ordering by your time column. i.e.
daily_class.find(:all, :conditions => ['whatever'], :order => :starting_time)
Grouping by day is a little more tricky. However this is an excellent post on how to group by week. Grouping by day will be analogous.
If you are dealing with non-trivial volumes of data, it may be better to do it in the database, with a find_by_sql and that may depend on your database's time and date functionality, but again storing the data as a Time will also help you here. For example in Postgresql (which I use), getting the week of a class is
date_trunc('week', starting_time)
which you can use in a Group By clause, or as a value to use in some loop logic in rails.
Re days-of-week, if you need to have e.g. classes that meet 09:00-10:00 on MWF, then you could either use a separate table for days a class meets (keyed by both class ID and DOW) or be evil (i.e. non-normalized) and keep the equivalent of an array of DOW in each class. The classic argument is this:
The separate table can be indexed in a way to support either class-oriented or DOW-oriented selects, but takes a bit more glue to put the entire picture together for a class.
The array-of-DOW is simpler to visualize for beginning programmers and slightly simpler to code about, but means that reasoning about DOW requires looking at all classes.
If this is only for your personal class schedule, do what gets you the value you're looking for, and live with the consequences; if you're trying to build a real system for multiple users, I'd go with a separate table. All those normalization rules are there for a reason.
As far as (human-readable) DOW names, that's a presentation-layer issue, and shouldn't be in the core concept of DOW. (Suppose you decided to move to Montreal, and needed French? That should be another "face" and not a change to the core implementation.)
As for starting/ending times, again the issue is your requirements. If all classes begin and end at hour (x:00) boundaries, you could certainly use 0..23 as the hours of the day. But then your life would be miserable as soon as you had to accommodate that 45-minute seminar. As the old commercial said, "Pay me now or pay me later."
One approach would be to define your own ClassTime concept and partition all reasoning about times to that class. It could start with a simplistic representation (integral hours 0..23, or integral minutes after midnight 0..1439) and then "grow" as needed.

Resources