Warning about "xsd:date" - jena

I am using Jena to parse a "TTL" formatted file. I see the warning in the console
Lexical form '1896-13-04' not valid for datatype http://www.w3.org/2001/XMLSchema#date
I want to know why this warning happens.

Per the XML schema specification for xsd:date:
The ·lexical space· of date consists of finite-length sequences of characters of the form: '-'? yyyy '-' mm '-' dd zzzzzz? where the date and optional timezone are represented exactly the same way as they are for dateTime
i.e. dates must follow the International Convention of having year then month then day.
From the example given your data appears to have dates in the American convention which has year then day then month. Since 13 is not a valid month you receive a warning.
Your input data is not valid according to the specifications and therefore may not be processed correctly when you try to ask queries based upon that data e.g. Find items with dates before or after a specific date of interest. Dates for which you are not receiving a warning maybe interpreted incorrectly a day and month being interchanged.
You need to correct the data as otherwise this will cause you issues later. If the data is from a public data source you should let them know that they have a data quality issue, if the data is being created by yourself you need to correct your data generation so the dates following the specification.

Related

NLP approaches to identify dates/time expressions in text

I need to develop an application which identifies the date inside the given text using some NLP approach. Let's assume I have a data in DB with dates column "from", "to" and if the text is below,
Get data between 1st August and 15th August
I need to identify the dates and form the query to retrieve the data. I used Natty NLP and I was able to identify the dates. But I'm stuck for more complex time expressions like:
Get data uploaded next week
Get data uploaded last week
Here for the first one I need to identify next week Monday's date and Sunday's date and form the query same for the 2nd one. But with Natty it gives me next week from today's date. What other solutions exist? Or do I need to manipulate the expression by coding? I am using Java.
Your questions is a bit confusing, but I guess you want to achieve two things:
Identify words that represent a time expression
Map these words to a formal machine-readable representation
If that is what you need check the duckling framework, it identifies time expressions, and it normalise them into a single unique formal date representation.
Note that you need to pass a reference date, for ambiguous time expressions.
You can run it as a service and call it from your code.

Swift CoreData : Check Date is Available Control

I am using CoreData. I'm adding date and some datas. I need a if statement. This is if statement will work like that :
"if this date is available in CoreData database, user won't add any data."
I used this:
if newuser.valueForKey(NSDate) as NSDate == NSDate()
This is absolutely wrong. I'm new and i don't create this if statement. how can i do this ?
Thanks already !
To compare the dates you should be using isEqualToDate:
if newuser.valueForKey(NSDate).isEqualToDate(NSDate())
But, dates are very accurate, so the current date would need to match the saved date, and that's never going to happen - at least a part of a second will have passed before you make the comparison.
So, what you really need to do is to find out what day, month and year the date is an compare those.
In a number of ways it would be best to store these values explicitly in Core Data instead of using a date object, though both would work. In either case you need to get the date components from the date in order to find out the day, month and year and then you need to compare them (possibly creating a date with only day, month and year and no time so you can compare it to the stored date which should also have no time set).

change date format in sqlite

Change this date format which is in sqlite db 12/10/11 to 12-10-11 (mm-dd-yy) I am unable to do so .I am a noob in sqlite and have to parse this value SELECT strftime('%d-%m-%Y',Date) from report but I am getting null as sqlite db excepts value in mm-dd-yy so How do I convert format 12/10/11 to 12-10-11 (mm-dd-yy) .Thanks in advance .Really appreciate the help.
The short answer:
If you have a text string stored as "12/10/11" that you want reported as "12-10-11", you should use the replace(X,Y,Z) function, to replace occurrences of Y in X with Z. Thus:
SELECT replace('12/24/11','/','-');
will return:
12-10-11
The long answer:
First, dates do not actually exist as a proper datatype in SQLite. They're stored as either TEXT, REAL, or INTEGER values. See date and time datatype in SQLite. So it depends upon how your date was stored in the database.
Second, you seem to be implying that you stored the date in a "mm/dd/yy" format. That's not a valid/useful TEXT format to be storing date/time values (as the date cannot be sorted, cannot used in "greater than" and "less than" operations, cannot be used in SQLite date functions, etc.). You really want to store datetime values in one of the formats listed in the "Time strings" section of the date and time functions document.
So, generally you should store your date/time values in one of those formats, use NSDateFormatter to convert that to a NSDate when you retrieve it from the database. And when you want to display the date value in your app, use whatever format you want for output.
But, if you don't care that the dates are stored as text strings and are not effectively usable as dates in SQLite, then just treat it as a plain old TEXT string and use TEXT functions, such as replace(X,Y,Z) to replace occurrences of "/" with "-", as outlined above.

What data structure is recommended for multiple calendars, dates and durations?

I have a requirement to store dates and durations arising from multiple different calendars. In particular I need to store dates that:
Span the change to Gregorian calendars in different countries at different times
Cover a historic period of at least 500 years
Deal with multiple types of calendar - lunar, solar, Chinese, Financial, Christian, UTC, Muslim.
Deal with the change, in the UK, of the year end from 31st March to 31st December, and comparable changes in other countries.
I also need to store durations which I have defined as the difference between two timestamps (date and time). This implies the need to be able to store a "zero" date - so I can store durations of, say, three and a half hours; or 10 minutes.
I have details of the computations needed. Firebird's timestamp is based on a date function that starts at January 1st, 100 CE, so is not capable of being used for durations in the way I need to record them. In addition this data type is geared up (like most timestamp functions) to record the number of days since a base date; it is not geared up to record calendar dates.
Could anyone suggest:
A data structure to store dates and durations that meet the above requirements OR
A reference to such a data structure OR
Offer guidelines to approach the structuring of such storage OR
Any points that may help me to a solution.
EDIT:
#Warren P has provided some excellent work in his responses. I obviously have not explained what I am seeking clearly enough, as his work concentrates on the computations and how to go about calculating these. All valuable and useful stuff, but not what I intended my question to convey.
I do have details of all the computations needed to convert between various representations of dates, and I have a fairly good idea of how to implement them (using elements such as Warren suggests). However, my requirement is to STORE dates which meet the various criteria listed above. Example: date to be stored - 'Third June 13 Charles II'. I am trying to determine an appropriate structure within which to store such dates.
EDIT:
I have amended my proposed schema. I have listed the attributes on each table, and defined the tables and attributes by examples, given in the third section of the entity box. I have used the example given in this question and answer in my definition by example, and have amended the example in my question to correspond. Although I have proved my schema by describing somebody else's example, this schema may still be over complicated; over analysed; miss some obvious simplification and may prove very difficult to implement (Indeed, it may be plain wrong). Any comments or suggestions would be most welcome.
If you are writing your own, as I assume you intend to, I would make a class that contains a TDateTime, and other fields, and I would base it on the functionality in the very nicely written mxDateTime extension for Python, which is very easily readable, open source, C code, that you could use to extract the gregorian calendar logic you are going to need.
Within certain limits, TDateTime is always right. It's epoch value (0) is December 30, 1899 at midnight. From there, you can calculate other julian day numbers. It supports negative values, and thus it will support more than 400 years. I believe you will start having to do corrections, at the time of the last Gregorian calendar reforms. If you go from Friday, 15 October 1582, and figure out its julian day number, and the reforms before and after that, you should be able to do all that you require. Be aware that the time of day runs "backwards" before 1899, but that this is purely a problem in human heads, the computer will be accurate, and will calculate the number of minutes and seconds, up to the limit of double precision floating point math for you. Stick with TDateTime as your base.
I found some really old BorlandPascal/TurboPascal code that handles a really wide range of dates here.
If you need to handle arabic, jewish, and other calendars, again, I refer you to Python as a great source of working examples. Not just the mxdatetime extension, but stuff like this.
For database persistence, you might want to base your date storage around julian day numbers, and your time as C-like seconds since midnight, if the maximum resolution you need is 1 second.
Here's a snippet I would start with, and do code completion on:
TCalendarDisplaySubtype = ( cdsGregorian,cdsHebrew,cdsArabic,cdsAztec,
cdsValveSoftwareCompany, cdsWhoTheHeckKnows );
TDateInformation = class
private
FBaseDateTime:TDateTime;
FYear,FMonth,FDay:Integer; // if -1 then not calculated yet.
FCalendarDisplaySubtype:TCalendarDisplaySubtype;
public
function SetByDateInCE(Y,M,D,h,m,s:Integer):Boolean;
function GetAsDateInCE(var Y,M,D,h,m,s:Integer):Boolean;
function DisplayStr:String;
function SetByDateInJewishCalendar( ... );
property BaseDateTime:TDateTime read FDateTime write FDateTime;
property JulianDayNumber:Integer read GetJulianDayNumber write SetJulianDayNumber;
property CalendarDisplaySubType:TCalendarDisplaySubtype;
end;
I see no reason to STORE both the julian day number, and the TDateTime, just use a constant, subtract/add from the Trunc(FBaseDateTime) value, and return that, in the GetJulianDayNumber,SetJulianDayNumber functions. It might be worth having fields where you calculate the year, month, day, for the given calendar, once, and store them, making the display as string function much simpler and faster.
Update: It looks like you're better at ER Modelling than me, so if you posted that diagram, I'd upvote it, and that would be it. As for me, I'd be storing three fields; A Datetime field that is normalized to modern calendar standards, a text field (free form) containing the original scholarly date in whatever form, and a few other fields, that are subtype lookup table Foreign keys, to help me organize, and search on dates by the date and subtype. That would be IT for me.
Only a partial answer but an important piece.
Since you are going to store dates in a very broad range where a lot of things happened to calendars, you need to accommodate for those changes.
The timezone database TZ-database and the Delphi TZDB wrapper around the TZ-database will be of big help.
It has a database with rules how timezones historically behave.
I know they are based on the current calendar schemes, and you need to convert to UTC first.
You need to devise something similar for the other calendar schemes you want to support.
Edit:
The scheme I'd use would be like this:
find ways for all your calendars to convert to/from UTC
store the calendar type
store the dates in their original format, and the source of the date (just in case your source screwed up, and you need to recalculate).
use the UTC conversions to go from your original through UTC to the calendar types in your UI
--jeroen

How would I store a date that can be partial (i.e. just the year, maybe the month too) and output it later with the same specifity?

I want to let users specify a date that may or may not include a day and month (but will have at least the year.) The problem is when it is stored as a datetime in the DB; the missing day/month will be saved as default values and I'll lose the original format and meaning of the date.
My idea was to store the real format in a column as a string in addition to the datetime column. Then I could use the string column whenever I have to display the date and the datetime for everything else. The downside is an extra column for every date column in the table I want to display, and printing localized dates won't be as easy since I can't rely on the datetime value... I'll probably have to parse the string.
I'm hoping I've overlooked something and there might be an easier way.
(Note I'm using Rails if it matters for a solution.)
As proposed by Jhenzie, create a bitmask to show which parts of the date have been specified. 1 = Year, 2 = Month, 4 = Day, 8 = Hour (if you decide to get more specific) and then store that into another field.
The only way that I could think of doing it without requiring extra columns in your table would be to use jhenzie's method of using a bitmask, and then store that bitmask into the seconds part of your datetime column.
in your model only pay attention to the parts you care about. So you can store the entire date in your db, but you coalesce it before displaying it to the user.
The additional column could simple be used for specifying what part of the date time has been specified
1 = day
2 = month
4 = year
so 3 is day and month, 6 is month and year, 7 is all three. its a simple int at that point
If you store a string, don't partially reinvent ISO 8601 standard which covers the case you describe and more:
http://en.wikipedia.org/wiki/ISO_8601
Is it really necessary to store it as a datetime at all ? If not stored it as a string 2008 or 2008-8 or 2008-8-1 - split the string on hyphens when you pull it out and you're able to establish how specific the original input was
I'd probably store the datetime and an additional "precision" column to determine how to output it. For output, the precision column can map to a column that contains the corresponding formatting string ("YYYY-mm", etc) or it can contain the formatting string itself.
I don't know a lot about DB design, but I think a clean way to do it would be with boolean columns indicating if the user has input month and day (one column for each). Then, to save the given date, you would:
Store the date that the user input in a datetime column;
Set the boolean month column if the user has picked a month;
Set the boolean day column if the user has picked a day.
This way you know which parts of the datetime you can trust (i.e. what was input by the user).
Edit: it also would be much easier to understand than having an int field with cryptic values!
The informix database has this facility. When you define a date field you also specify a mask of the desired time & date attributes. Only these fields count when doing comparisons.
With varying levels of specificity, your best bet is to store them as simple nullable ints. Year, Month, Day. You can encapsulate the display logic in your presentation model or a Value Object in your domain.
Built-in time types represent an instant in time. You can use the built in types and create a column for precision (Year, Month, Day, Hour, Etc.) or you can create your own date structure and use nulls (or another invalid value) for empty portions.
For ruby at least - you could use this gem - partial-date
https://github.com/58bits/partial-date

Resources