I'm having trouble with semantic predicates in ANTLR 4. My grammar is syntactically ambiguous, and needs to look ahead one token to resolve the ambiguity.
As an example, I want to parse "Jan 19, 2012 until 9 pm" as the date "Jan 19, 2012" leaving parser's next token at "until". And I want to parse "Jan 19, 7 until 9 pm" as the date "Jan. 19" with parser's next token at "7".
So I need to look at the 3rd token and either take it or leave it.
My grammar fragment is:
date
: month d=INTEGER { isYear(getCurrentToken().getText())}? y=INTEGER
{//handle date, use $y for year}
| month d=INTEGER {//handle date, use 2013 for year}
;
When the parser runs on either sample input, I get this message:
line 1:9 rule date failed predicate: { isYear(getCurrentToken().getText())}?
It never gets to the 2nd rule alternative, because (I'm guessing) it's already read one extra token.
Can someone show me how to accomplish this?
In parser rules, ANTLR 4 only uses predicates on the left edge when making a decision. Inline predicates like the one you showed above are only validated.
The following modification will cause ANTLR to evaluate the predicate while it makes the decision, but obviously you'll need to modify it to use the correct lookahead token instead of calling getCurrentToken().
date
: {isYear(getCurrentToken().getText())}? month d=INTEGER y=INTEGER
{//handle date, use $y for year}
| month d=INTEGER {//handle date, use 2013 for year}
;
PS: If month is always exactly one token long, then _input.LT(3) should provide the token you want.
Related
So I am working with an ANTLR grammar for parsing dates, and I want to be able to recognize not just individual date-units, but also pairs of date-units.
For the purposes of this question, I think it might be helpful to divide the kinds of questions I want to be able to recognize into 3 classes:
What was the temperature in August 2019? - Straightforward. Single date-unit (August 2019).
Which was hotter between June 3rd 2019 and yesterday? - Still straightforward. Two date-units (June 3, 2019 and yesterday).
Between August 2018 and 2019, which was hotter? - Tricky. The natural expectation of the user in this case would be to compare August 2018 and August 2019 (implicitly). To handle such cases, I want 2018 and 2019 to be parsed as a single year_pair rule and August to be parsed as a month.
I am currently handling only cases 1 and 2. Case 1 is handled in a straightforward way. Case 2 is handles by having a date_unit AND date_unit rule. But to handle Case 3 now, I also tried adding a year AND year rule, so that 2018 and 2019 is picked up as a year_pair much before, but due to the top-down nature of ANTLR, it still parses them into August 2018 and 2019.
How can I go about changing this such that it parses August 2018 and 2019 into August and 2018 and 2019 instead (while also retaining the general date_unit AND date_unit rule?
You are trying to add semantic to a syntax. From a language standpoint the implicit user expectation doesn't matter at all. The parser (as a syntax tool) can only determine if input conforms to a language and not if the input also matches semantic rules).
Instead you should use ANTLR4 to quantify your input and create the parse tree. Then in a second step do the semantic analysis where you can apply your special date rules (e.g. auto fill implicit date parts).
"Bottom-up" is a term that's synonymous for LR parsing for decades and has nothing to do with ANTLR nor the problem. It's the wrong term.
Mike's solution above is what most people would do because a date_range corresponds to just a Tuple<date_unit, date_unit>, and one would just be creating that type in the semantic analyzer. You want to describe a different range, something like Tuple<month, Tuple<year, year>> and other variations syntactically. Here is a grammar that does that. It produces the trees you are looking for, for all three of your examples.
grammar Dates;
MONTH : 'January' | 'February' | 'March' | 'April' | 'May' | 'June' | 'July' | 'August' | 'September' | 'October' | 'November' | 'December' ;
YESTERDAY : 'yesterday' ;
FIRST : 'First';
SECOND : 'Second';
THIRD : 'Third';
AND : 'and' ;
BETWEEN : 'between';
ORDINAL: [1-9][0-9]* ('rd' | 'th');
CARDINAL : [0-9]+ ;
WS: [ \t\r\n]+ -> skip;
// NB: Note order here.
range
: BETWEEN month year_group
| BETWEEN date_unit AND date_unit
;
input: ( date_unit | range ) EOF ;
year_group : year AND year ;
date_unit : month day year | month year | year | yesterday ;
day : ordinal | CARDINAL ;
ordinal : ORDINAL | FIRST | SECOND | THIRD ;
month : MONTH ;
year : CARDINAL ;
yesterday : YESTERDAY ;
I found in some code I maintain they used this format for an update query
UPDATE X=to_date('$var','%iY-%m-%d %H:%M:%S.%F3') ...
But I can't find anywhere in Informix documentation what the i is for. Running this next query will result the same values.
SELECT TO_CHAR(CURRENT, '%Y-%m-%d %H:%M:%S%F3') as wo_I,
TO_CHAR(CURRENT, '%iY-%m-%d %H:%M:%S%F3') as with_I FROM X;
wo_i | with_i
------------------------|------------------------
2017-06-20 16:49:44.712 | 2017-06-20 16:49:44.712
So what am I missing?
Resources I looked into:
https://www.ibm.com/support/knowledgecenter/SSGU8G_11.70.0/com.ibm.sqlt.doc/ids_sqt_130.htm
https://www.ibm.com/support/knowledgecenter/SSGU8G_11.70.0/com.ibm.sqlt.doc/ids_sqt_129.htm
http://www.sqlines.com/informix-to-oracle/to_char_datetime
It's a trifle hard to find, but one location for the information you need (assuming you use Informix 11.70 rather than 12.10, though it probably hasn't changed much) is:
Client APIs and Tools — GLS User's Guide — GLS Environment Variables
In particular, it says:
%iy — Is replaced by the year as a two-digit number (00 - 99) for both reading and printing. It is the formatting directive specific to IBM Informix for %y.
%iY — Is replaced by the year as a four-digit number (0000 - 9999) for both reading and printing. It is the formatting directive specific to IBM Informix for %Y.
…
%y — Requires that the year is a two-digit number (00 through 99) for both reading and printing.
%Y — Requires that the year is a four-digit number (0000 through 9999) for both reading and printing.
There clearly isn't much difference between the two — I'm not even sure I understand what the difference is supposed to be. I think it may be the difference between accepting but not requiring leading zeros on 1, 2 or 3 digit year numbers. But for the most part, it seems you can treat them as equivalent.
I need to convert this server date (Given from kinvey request) into local timezone.
I'm using the following code:
let dateFormatter = NSDateFormatter()
dateFormatter.dateFormat = "yyyy-MM-ddTHH:mm:ss.sTZD"
print(dateFormatter.dateFromString(newValue))
The date format is this:
ect = "2016-08-28T16:30:06.553Z" or
lmt = "2016-08-28T16:30:06.553Z"
When I print the date it is nil, do you know what I'm doing wrong ?. I think it could be the end of the dateFormat
If your app can target only iOS7+, you can use format symbols described in:
Fixed Formats (in Data Formatting Guide)
Unicode Technical Standard #35 version tr35-31
second | S | 1..n | 3456 | Fractional Second - truncates (like other
time fields) to the count of letters. (example shows display using
pattern SSSS for seconds value 12.34567)
zone | X | 1 | -08,+0530,Z | The
ISO8601 basic format with hours field and optional minutes field. The
ISO8601 UTC indicator "Z" is used when local time offset is 0. (The
same as x, plus "Z".)
So, to parse fractional second, use uppercase 'S',
and 'X' for timezone including "Z" as UTC.
Try this:
dateFormatter.dateFormat = "yyyy-MM-dd'T'HH:mm:ss.SSSX"
(I escaped 'T' as it may be used as another time formatting symbol in the future.)
PS. Though I couldn't have found a thread describing the date format which interprets "Z" as UTC+0000, ignoring or removing it may not be a bad solution, if some conditions met. Please find your best solution.
In the Hadoop infrastructure (Java-based) I am getting timestamps as string values in this format:
2015-10-01T04:22:38:208Z
2015-10-01T04:23:35:471Z
2015-10-01T04:24:33:422Z
I tried different patters following examples for SimpleDateFormat Java class without any success.
Replaced 'T' with ' ' and 'Z' with '', then
"yyyy-MM-dd HH:mm:ss:ZZZ"
"yyyy-MM-dd HH:mm:ss:zzz"
"yyyy-MM-dd HH:mm:ss:Z"
"yyyy-MM-dd HH:mm:ss:z"
Without replacement,
"yyyy-MM-dd'T'HH:mm:ss:zzz'Z'"
In fact, this format is not listed among examples. What should I do with it?
Maybe those 3 digits are milliseconds, and time is in UTC, like this: "yyyy-MM-dd'T'HH:mm:ss.SSSZ"? But it still should look like "2015-11-27T10:50:44.000-08:00" as standardized format ISO-8601.
Maybe, this format is not parsed correctly in the first place?
I use Ruby, Python, Pig, Hive to work with it (but not Java directly), so any example helps. Thanks!
I very strongly suspect the final three digits are nothing to do with time zones, but are instead milliseconds, and yes, the Z means UTC. It's a little odd that they're using : instead of . as the separator between seconds and milliseconds, but that can happen sometimes.
In that case you want
"yyyy-MM-dd'T'HH:mm:ss:SSSX"
... or use
"yyyy-MM-dd'T'HH:mm:ss:SSS'Z'"
and set your SimpleDateFormat's time zone to UTC explicitly.
I never wrote any complex regular expression before, and what I need seems to be (at least) a bit complicated.
I need a Regex to find matches for the following:
"On Fri, Jan 16, 2015 at 4:39 PM"
Where On will always be there;
then 3 characters for week day;
, is always there;
space is always there;
then 3 characters for month name;
space is always there;
day of month (one or two numbers);
, is always there;
space is always there;
4 numbers for year;
space at space always there;
time (have to match 4:39 as well as 10:39);
space and 2 caps letters for AM or PM.
Here's a very simple and readable one:
/On \w{3}, \w{3} \d{1,2}, \d{4} at \d{1,2}:\d{2} [AP]M/
See it on rubular
Try this:
On\s+(?:Mon|Tue|Wed|Thu|Fri|Sat|Sun), (?:Jan|Feb|Mar|Apr|May|June|July|Aug|Sept|Oct|Nov|Dec) \d{1,2}, \d{4} at \d{1,2}:\d{2} (?:AM|PM)
/On \w{3}, \w{3} \d{1,2}, \d{4} at \d{1,2}:\d{1,2} [A-Z]{2}/
# \w{3} for 3 charecters
# \d{1,2} for a or 2 digits
# \d{4} for 4 digits
# [A-Z]{2} for 2 capital leters
You could try the below regex and it won't check for the month name or day name or date.
^On\s[A-Z][a-z]{2},\s[A-Z][a-z]{2}\s\d{1,2},\s\d{4}\sat\s(?:10|4):39\s[AP]M$
DEMO
You can use Rubular to construct and test Ruby Regular Expressions.
I have put together an Example: http://rubular.com/r/45RIiwheqs
Since it looks you try to parse dates, you should use Date.strptime.
/On [A-Za-z]{3}, [A-Za-z]{3} \d{1,2}, \d{4} at \d{1,2}:\d{1,2}/g
The way you are describing the problem makes me thing that the format will always be preserved.
I would then in your case use the Time.parse function, passing the format string
format = "On %a, %b"On Fri, Jan 16, 2015 at 4:39 PM", format)
which is more readable than a regexp (in my opinion) and has the added value that it returns a Time object, which is easier to use than a regexp match, in case you need to perform other time-based calculations.
Another good thing is that if the string contains an invalid date (like "On Mon, Jan 59, 2015 at 37:99 GX" ) the parse function will raise an exception, so that validation is done for free for you.