My app lets people log the movies they see (for example). Each logged movie usually (but not always) has a date and sometimes has a time. It's not unusual to have one but not the other. Occasionally the dates are only a year ("I watched a Dumbo sometime in 1984"), but could realistically be any combination of day/month/year/time.
I am used to modeling dates as date objects in my app and my backend. But is it a viable approach to store each component separately? When I need to reference an actual date from the components (e.g. for sorting the log) this will be built client-side, or perhaps be stored as a derived property sortDate and updated whenever any of the components change.
My reservation is that the information the user is saving is truly a 'moment in time' and I will have to take care of some things myself - for example what time zone are my components stored relative to? This would be captured automatically as part of a real Date object.
The alternative seems to be assuming some sort of 'default' for missing components (e.g. year 0000 if no year, time 00:00 if no time). But those defaults have meaning and I won't be able to distinguish them from 'not provided'.
What are the limitations and/or pitfalls of this approach? Does anyone have experience modeling their dates this way?
If it's of any consequence, my app is for iOS written in Swift and uses a Parse Server backend.
I've successfully used question marks to represent ambiguous and unknown timestamp parts in legal systems. Try to keep in mind that you're really not modeling dates here ('1984' isn't a date); you're modeling facts about dates.
So, if one of your users saw a movie some time in 1984, you might record the value '1984-??-?? ??:??:??' in a text column in a database. Values like this sort sensibly.
See also this answer on dba. Comments on that answer are also good to read.
Is it possible to aggregate measurements or create custom queries beyond the standard dateFrom dateTo queries?
As an example, I have measurements which have a time delta of 1 minute (2015-01-01T05:05:00, 2015-01-01T05:05:00, 2015-01-01T05:05:00, ...) and I would like to query the measurements at 15 minute intervals (2015-01-01T05:15:00, 2015-01-01T05:30:00, 2015-01-01T05:45:00, ...)
So far I have only come up with these solutions:
Using the standard api request as in
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-10-01&dateTo=2015-11-05
and then throwing away most of the data will use a massive amount of time loading the data.
Using cep (cumulocity event language) to generate a new measurement every 15 minutes using the nearest 1 minute measurement seems like a bit of overkill and not very elegant.
Batch requesting the exact minute
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-11-05T05:15:00%2B01:00&dateTo=2015-11-05T05:16:00%2B01:00
which will in a massive amount of API requests and also does not seem very efficient.
Use the /measurements/series endpoint which will only give me all series, even those I do not want, as well as only having the aggregation options hourly and daily (as far as I can tell).
Is there a better way of doing this?
you have captured nearly all of the mechanisms that are currently available. There is one more possibility -- not sure if this is an option for you:
Mark the fifteenth measurement when sending it from the device, using e.g. a different type.
I would normally use 2. It's actually quite efficient, it's similar to a materialized view in traditional SQL, plus you can use the data everywhere and in all widgets.
Good luck :-)
Cheers,
André
I would prefer the CEP solution. The rule wouldn't be that complicated. You would of course then store these measurements twice which is not that nice but having your desired measurement with a specific type or fragment will give you the fastest way to query it.
Instead of copying the measurement you could just add a special fragment to the measurement every 15 min in the CEP rule. You cannot update measurements so you would have to delete the measurement incoming every 15 min and then create a new measurement with exactly the same values but add a fragement (e.g. "aggregatedMeasurement": {}).
Your query then looks like this:
https://tenant.cumulocity.com/measurement/measurements?dateFrom=2015-10-01&dateTo=2015-11-05&fragmentType=aggregatedMeasurement
One more idea for point 3:
You could use SmartREST to create a template with the query string and leave the dateFrom and dateTo as placeholders.
From the client side you then would have to make only one request using the bulking feature in SmartREST.
On the server side this would still be transformed into the single requests so you wouldn't gain anything in speed.
How do you compare dates with lua? What is the best string format for dates? Should I store dates in epoch? I am looking for performance ...
Is the best way os.difftime?
You are asking several things, so here are my answers:
Should I store dates in epoch?
In general yes, the best way to store the dates is by using epochs, as returned by os.time
How do you compare dates with lua?
It depends on how you want to "compare" them.
If you just want to know which one is newer/older, then the easiest fastest thing is storing them as "epochs" and then doing date1 < date2; since both dates are just numbers, this is both performant and clean.
If you want to know how many months/days/years have passed between two given dates, that's a bit more complex. You will need a code similar to this:
diff = os.date("*t", os.difftime(date1, date2))
On that example, the returned diff is a table similar to {year=1, month=5, day=1, hour=2, min=3, sec=40 ...}
I am looking for performance ...
If you are using os.date() too often to transform epochs into dates (for example, for printing) then you might want to "cache" the year, month, etc information in a table, so you don't have to call it again and again. But do this only if you experience a bad performance; don't pre-optimize.
What is the best string format for dates?
That completely depends on how you want to use them. For example, if your app interacts with another service that expects a certain date format, it makes sense to use that format in all your app.
If you have no particular need to use a format, then one candidate is (%x):
os.date("%x", date) -- 09/16/1998 (for example)
The string that gives you depends on the computer's locale. This might or might not be desirable.
If you want the representation to be the same across all computers, independently of their locale, you might want to try a standard format, like ISO 8601:
os.date("%Y-%m-%d", date) -- returns "1998-09-16" in all computers
This format has lots of advantages; the most obvious one is that dates sorted out alphabetically are also sorted out chronologically. But the most important one is that a lot of software out there is prepared to read/use it.
You can find more information about dates in Programming in Lua, Section 22.1 - Date and Time and in the lua-users wiki.
I have a requirement to store dates and durations arising from multiple different calendars. In particular I need to store dates that:
Span the change to Gregorian calendars in different countries at different times
Cover a historic period of at least 500 years
Deal with multiple types of calendar - lunar, solar, Chinese, Financial, Christian, UTC, Muslim.
Deal with the change, in the UK, of the year end from 31st March to 31st December, and comparable changes in other countries.
I also need to store durations which I have defined as the difference between two timestamps (date and time). This implies the need to be able to store a "zero" date - so I can store durations of, say, three and a half hours; or 10 minutes.
I have details of the computations needed. Firebird's timestamp is based on a date function that starts at January 1st, 100 CE, so is not capable of being used for durations in the way I need to record them. In addition this data type is geared up (like most timestamp functions) to record the number of days since a base date; it is not geared up to record calendar dates.
Could anyone suggest:
A data structure to store dates and durations that meet the above requirements OR
A reference to such a data structure OR
Offer guidelines to approach the structuring of such storage OR
Any points that may help me to a solution.
EDIT:
#Warren P has provided some excellent work in his responses. I obviously have not explained what I am seeking clearly enough, as his work concentrates on the computations and how to go about calculating these. All valuable and useful stuff, but not what I intended my question to convey.
I do have details of all the computations needed to convert between various representations of dates, and I have a fairly good idea of how to implement them (using elements such as Warren suggests). However, my requirement is to STORE dates which meet the various criteria listed above. Example: date to be stored - 'Third June 13 Charles II'. I am trying to determine an appropriate structure within which to store such dates.
EDIT:
I have amended my proposed schema. I have listed the attributes on each table, and defined the tables and attributes by examples, given in the third section of the entity box. I have used the example given in this question and answer in my definition by example, and have amended the example in my question to correspond. Although I have proved my schema by describing somebody else's example, this schema may still be over complicated; over analysed; miss some obvious simplification and may prove very difficult to implement (Indeed, it may be plain wrong). Any comments or suggestions would be most welcome.
If you are writing your own, as I assume you intend to, I would make a class that contains a TDateTime, and other fields, and I would base it on the functionality in the very nicely written mxDateTime extension for Python, which is very easily readable, open source, C code, that you could use to extract the gregorian calendar logic you are going to need.
Within certain limits, TDateTime is always right. It's epoch value (0) is December 30, 1899 at midnight. From there, you can calculate other julian day numbers. It supports negative values, and thus it will support more than 400 years. I believe you will start having to do corrections, at the time of the last Gregorian calendar reforms. If you go from Friday, 15 October 1582, and figure out its julian day number, and the reforms before and after that, you should be able to do all that you require. Be aware that the time of day runs "backwards" before 1899, but that this is purely a problem in human heads, the computer will be accurate, and will calculate the number of minutes and seconds, up to the limit of double precision floating point math for you. Stick with TDateTime as your base.
I found some really old BorlandPascal/TurboPascal code that handles a really wide range of dates here.
If you need to handle arabic, jewish, and other calendars, again, I refer you to Python as a great source of working examples. Not just the mxdatetime extension, but stuff like this.
For database persistence, you might want to base your date storage around julian day numbers, and your time as C-like seconds since midnight, if the maximum resolution you need is 1 second.
Here's a snippet I would start with, and do code completion on:
TCalendarDisplaySubtype = ( cdsGregorian,cdsHebrew,cdsArabic,cdsAztec,
cdsValveSoftwareCompany, cdsWhoTheHeckKnows );
TDateInformation = class
private
FBaseDateTime:TDateTime;
FYear,FMonth,FDay:Integer; // if -1 then not calculated yet.
FCalendarDisplaySubtype:TCalendarDisplaySubtype;
public
function SetByDateInCE(Y,M,D,h,m,s:Integer):Boolean;
function GetAsDateInCE(var Y,M,D,h,m,s:Integer):Boolean;
function DisplayStr:String;
function SetByDateInJewishCalendar( ... );
property BaseDateTime:TDateTime read FDateTime write FDateTime;
property JulianDayNumber:Integer read GetJulianDayNumber write SetJulianDayNumber;
property CalendarDisplaySubType:TCalendarDisplaySubtype;
end;
I see no reason to STORE both the julian day number, and the TDateTime, just use a constant, subtract/add from the Trunc(FBaseDateTime) value, and return that, in the GetJulianDayNumber,SetJulianDayNumber functions. It might be worth having fields where you calculate the year, month, day, for the given calendar, once, and store them, making the display as string function much simpler and faster.
Update: It looks like you're better at ER Modelling than me, so if you posted that diagram, I'd upvote it, and that would be it. As for me, I'd be storing three fields; A Datetime field that is normalized to modern calendar standards, a text field (free form) containing the original scholarly date in whatever form, and a few other fields, that are subtype lookup table Foreign keys, to help me organize, and search on dates by the date and subtype. That would be IT for me.
Only a partial answer but an important piece.
Since you are going to store dates in a very broad range where a lot of things happened to calendars, you need to accommodate for those changes.
The timezone database TZ-database and the Delphi TZDB wrapper around the TZ-database will be of big help.
It has a database with rules how timezones historically behave.
I know they are based on the current calendar schemes, and you need to convert to UTC first.
You need to devise something similar for the other calendar schemes you want to support.
Edit:
The scheme I'd use would be like this:
find ways for all your calendars to convert to/from UTC
store the calendar type
store the dates in their original format, and the source of the date (just in case your source screwed up, and you need to recalculate).
use the UTC conversions to go from your original through UTC to the calendar types in your UI
--jeroen
I have a lot of logfile data that I want to display dynamic graphs from, for basically arbitrary time periods, optionally filtered or aggregated by different columns (that I could pregenerate). I'm wondering about the best way to store the data in a database and access it for displaying charts, when:
the time resolution should be variable from one second to a year
there are entries that span several 'time buckets', e.g. a connection might have been open for a few days and I want to count and display the user for every hour she was connected, not just in the hour 'slot' the connection was created or finished
Are there best practices, or tools/plugins for rails that help handle this kind and amount of data? Are there maybe database engines specifically tailored towards this, or having helpful functions (e.g. CouchDB indexes)?
EDIT: I'm looking for a scalable way to handle this data and access pattern. Things we considered: Run a query for each bucket, merge in app - probably way too slow. GROUP BY timestamp/granularity - does not count connections correctly. Preprocessing data into rows by smallest granularity and downsampling on query - probably the best way.
I think you can use mysql timestamps for this.
The way I solved it in the end was to pre-process the data into per-minute buckets, so there's one row for every event and minute. That makes it easy and fast enough to select and yields correct results. To get different granularity, you can do integer arithmetic on the timestamp columns - select abs(timestamp/factor)*factor and group by abs(timestamp/factor)*factor.