Parsing Time from text using SAS - parsing

I have a maximum of 200 characters length of text for thousands of records which has date and time in it. I am trying to parse out the time
Here is the example of the text that I have
Your account your account your account on Jan 10, 2020 at 8.30 AM ET your account
Your account your account your account on Jan 3, 2020 6.30PM ET your account
Your account your account your account on Jan 11, 2020 at 6.30PM ET your account
Desired output
8.30 AM
6.30PM
6.30PM
In all the content ET is common and I am using index and substr function to parse out the time.
Time=substr(Text,index(Text,' on ')+19,6);
For the second line, I am also getting extra characters since there is no at and no space between time and PM
Is there any efficient way to parse the time?
Thanks

SAS can locate a text excerpt using a Perl regular expression that has a capture buffer.
data want(keep=parsed_timestring);
length parsed_timestring $8;
input;
/* Pattern:
* On a word boundary \b
* Capture start (
* 1 or 2 digits \d{1,2}
* A period \.
* 1 or 2 digits \d{1,2}
* 0 or 1 spaces \s?
* letter A or P (A|P)
* letter M M
* Capture end )
*/
prx = prxparse('/\b(\d{1,2}\.\d{1,2}\s?(A|P)M)/x');
if prxmatch (prx, _infile_) then
parsed_timestring = prxposn(prx,1,_infile_);
datalines;
Your account your account your account on Jan 10, 2020 at 8.30 AM ET your account
Your account your account your account on Jan 3, 2020 6.30PM ET your account
Your account your account your account on Jan 11, 2020 at 6.30PM ET your account
Your account your account your account on Jan 11, 2020 at 6666.30PM ET your account
;
proc print;
run;
In the last row parsed_timestring is blank because 6666.30PM starts with more than two digits on a word boundary and thus would not match the pattern and thus the line would not have a proper time string.

Related

Date Extraction from a specific dataset - Google sheets

I've tried all types of date extraction from this timestamp but nothing works.
Data samples:
Mon 2021 Jul 26 2021 8:26 PM
Wed May 19 2021 22:54:00 GMT+0800 (Hong Kong Standard Time)
Tried MOD, = Time,Minute, and Timevalue
Does anyone have any idea?
Tried MOD, = Time,Minute, and Timevalue. Expected to extract the date but it doesn't.
try:
=INDEX(TEXT(IFNA(1*REGEXEXTRACT(TO_TEXT(A1:A),
"(\w+ \d+ \d{4})" ), "​"), "dd/mm/e"))
Use regexextract(), like this:
=to_date( value( regexextract( to_text(A2), "^\w+ (\w+ \w+ \w+)" ) ) )

Stop ...skipping... SUMMARY OF LESS COMMANDS Commands marked with * may be preceded by a number

When copying some code (about 50 lines) from a rake task to the rails console, I want it to just run the code in the rails console, but instead this happens. The first part is some results printing to screen, but I have no idea what happens next):
#<Appointment:0x00007fb83eec5358
id: "0f0e14a6-1645-4a7b-ad61-f799e60ac570",
doctor_id: 1,
patient_id: 1,
start_time: Sun, 24 Jan 2021 13:25:45 UTC +00:00,
end_time: Sun, 24 Jan 2021 14:25:45 UTC +00:00,
created_at: Sun, 24 Jan 2021 12:50:45 UTC +00:00,
updated_at: Sun, 24 Jan 2021 13:10:45 UTC +00:00]
...skipping...
SUMMARY OF LESS COMMANDS
Commands marked with * may be preceded by a number, N.
Notes in parentheses indicate the behavior if N is given.
A key preceded by a caret indicates the Ctrl key; thus ^K is ctrl-K.
h H Display this help.
q :q Q :Q ZZ Exit.
---------------------------------------------------------------------------
MOVING
e ^E j ^N CR * Forward one line (or N lines).
y ^Y k ^K ^P * Backward one line (or N lines).
What is going on and how do I stop it?
Notes:
Here is the full text of what appears
I tried running Pry.config.pager = false as provided here, but the problem happens despite that.

Ignoring "noise" in ANTLR4

I'd like to build a natural language date parser in ANTLR4 and got stuck on ignoring "noise" input. The simplified grammar below parses any string that contains valid dates in the format DATE MONTH:
dates
: simple_date dates
| EOF
;
simple_date
: DATE MONTH
;
DATE : [0-9][0-9]?;
MONTH : January | February | March // etc.;
Text such as "1 January 22 February" will be accepted. I wanted the grammar accept other text as well, so I added ANY : . -> skip; at the end:
dates
: simple_date dates
| EOF
;
simple_date
: DATE MONTH
;
DATE : [0-9][0-9]?;
MONTH : January | February | March // etc.;
ANY : . -> skip;
This doesn't quite do what I want, however. While string such as "On 1 January and 22 February" is accepted and the simple_date rule is matched twice, string "On 1XX January" will also match the rule.
Question: How do I build a grammar where rules are matched only with the exact token sequence while ignoring all other input, including tokens in an order not defined in any of the rules? Consider the following cases:
"From 1 January to 2 February" -> simple_date matches "1 January" and "2 February"
"From 1XX January to 2 February" -> simple_date matches "2 February", rest is ignored
"From January to February" -> no match, everything ignored
Do not drop extra "noise" in lexer such as your ANY rule. Lexer does not know under what context the current token is. And what you want is "dropping some noise tokens when it is not of the form DATE MONTH". Move your ANY rule to parser rules that match the noise.
Also, it's advisable to drop white spaces IN THE LEXER. But in that case, your ANY rule should exclude those matched by the WS rule. Also pay attention that your DATE rule intercepted a noise token of the form [0-9][0-9]?
dates
: (noise* (simple_date) noise*)+
;
simple_date
: DATE MONTH
;
noise: (DATE|ANY);
DATE : [0-9][0-9]?;
MONTH : 'January' | 'February' | 'March' ;
ANY : ~(' '|'\t' | '\f')+ ;
WS : [ \t\f]+ -> skip;
Accepts:
1 January and 22 February noise 33
1 January and 22 February 3
Rejects:
1xx January
This wasn't fully tested. Also your MONTH lexer rule also intercepted a standalone month literal (e.g. January) which is considered a noise but not handled in my grammar e.g.
22 February January

NSDateFormatter localizedStringFromDate dateStyle results

I'm creating this question so I can have a one-shot reference to all the date and time styles for each of the NSDateFormatterStyle enum values NSDateFormatterShortStyle, NSDateFormatterMediumStyle, NSDateFormatterLongStyle, NSDateFormatterFullStyle.
I often find myself in a position where I'd like to know if these default styles are sufficient for my clients, and it's hard to find all the styles in one place.
All the output below is in order 1 = NSDateFormatterShortStyle, 2 = NSDateFormatterMediumStyle, 3 = NSDateFormatterLongStyle, 4 = NSDateFormatterFullStyle. Please feel free to comment if you'd prefer a different organization of output.
English
2015-03-27, 9:42 AM
Mar 27, 2015, 9:42:45 AM
March 27, 2015 at 9:42:45 AM EDT
Friday, March 27, 2015 at 9:42:45 AM Eastern Daylight Time
Note that the Date Formatter separates Date & Time by "," in Short and Medium styles, and by "at" in long and full styles. Interesting!
French
2015-03-27 09:54
2015-03-27 09:54:07
27 mars 2015 09:54:07 HAE
vendredi 27 mars 2015 09 h 54 min 07 s heure avancée de l’Est
No commas at all here. French dates seem to be 24h.
German
27.03.15 09:58
27.03.2015 09:58:07
27. März 2015 09:58:07 GMT-4
Freitag, 27. März 2015 09:58:07 Nordamerikanische Ostküsten-Sommerzeit
Spanish
27/3/15 10:00
27/3/2015 10:00:05
27 de marzo de 2015, 10:00:05 GMT-4
viernes, 27 de marzo de 2015, 10:00:05 (Hora de verano oriental)
Simplified Chinese
15/3/27 上午10:01
2015年3月27日 上午10:01:40
2015年3月27日 GMT-4上午10:01:40
2015年3月27日 星期五 北美东部夏令时间上午10:01:40

Rails: Why, and how, does adding apparently equal values to equal dates give different results in this example?

I see that typing 100.days gives me [edit: seems to give me] a Fixnum 8640000:
> 100.days.equal?(8640000)
=> true
I would have thought those two values were interchangable, until I tried this:
x = Time.now.to_date
=> Wed, 31 Oct 2012
> [x + 100.days, x + 8640000]
=> [Fri, 08 Feb 2013, Mon, 07 May 25668]
Why, and how, does adding apparently equal values to equal dates give different results?
The above results are from the Rails console, using Rails version 3.1.3 and Ruby version 1.9.2p320. (I know, I should upgrade to the latest version...)
100.days doesn't return a Fixnum, it returns an ActiveSupport::Duration, which tries pretty hard to look like a integer under most operations.
Date#+ and Time#+ are overridden to detect whether a Duration is being added, and if so does the calculation properly rather than just adding the integer value (While Time.+ expects a number of seconds, i.e. + 86400 advances by 1 day, Date.+ expects a number of days, so +86400 advances by 86400 days).
In addition some special cases like adding a day on the day daylight savings comes into effect are covered. This also allow Time.now + 1.month to advance by 1 calendar month irrespective of the number of days in the current month.
Besides what Frederick's answer supplies, adding 8640000 to a Date isn't the same as adding 8640000 to a Time, nor is 100.days the correct designation for 100 days.
Think of 100.days meaning "give me the number of seconds in 100 days", not "This value represents days". Rails used to return the number of seconds, but got fancy/smarter and changed it to a duration so the date math could do the right thing - mostly. That fancier/smarter thing causes problems like you encountered by masking what's really going on, and makes it harder to debug until you do know.
Date math assumes day values, not seconds, whereas Time wants seconds. So, working with 100 * 24 * 60 * 60 = 8640000:
100 * 24 * 60 * 60 => 8640000
date = Date.parse('31 Oct 2012') => Wed, 31 Oct 2012
time = Time.new(2012, 10, 31) => 2012-10-31 00:00:00 -0700
date + 8640000 => Mon, 07 May 25668
time + 8640000 => 2013-02-08 00:00:00 -0700
date + 100 => Fri, 08 Feb 2013
It's a pain sometimes dealing with Times and Dates, and you're sure to encounter bugs in code you've written where you forget. That's where the ActiveSupport::Duration part helps, by handling some of the date/time offsets for you. The best tactic is to use either Date/DateTime or Time, and not mix them unless absolutely necessary. If you do have to mix them, then bottleneck the code into methods so you have a single place to look if a problem crops up.
I use Date and DateTime if I need to handle larger ranges than Time can handle, plus DateTime has some other useful features, otherwise I use Time because it's more closely coupled to the OS and C. (And I revealed some of my roots there.)

Resources