Rails Regex Match Group Overwriting itself - ruby-on-rails

I am trying to match this string:
NFPA 101 19.7.2.2
and am using this regex:
(NFPA) (\w+)(?: ?(?:([^.]+)\.?)+)?
This seems to match the string, but the captured groups are not what I'm looking for. I expect:
NFPA
101
19
7
2
2
What I get is this:
NFPA
101
2
See this rubular example:
http://rubular.com/r/43VY0yyNa7
It's as if that last recurring capture group is being overwritten by the final match. Is there a way to have all of these come back as capture groups as I need?
Added another regex that gives me the similar problem described above:
(NFPA) (.+) (.+?.)+(.+)

The issue is you're using non-capturing group symbol : which isn't gonna work to select the string as separate capture group. To overcome the issue you need to use Positive / Negative Lookahead. So, the following regex should work in this case :
(\w+|\d+[-]\d+)(?=\s?)(?![-])
see demo

Related

GSheets - How to query a partial string

I am currently using this formula to get all the data from everyone whose first name is "Peter", but my problem is that if someone is called "Simon Peter" this data is gonna show up on the formula output.
=QUERY('Data'!1:1000,"select * where B contains 'Peter'")
I know that for the other formulas if I add an * to the String this issue is resolved. But in this situation for the QUERY formula the same logic do not applies.
Do someone knows the correct syntax or a workaround?
How about classic SQL syntax
=QUERY('Data'!1:1000,"select * where B like 'Peter %'")
The LIKE keyword allows use of wildcard % to represent characters relative to the known parts of the searched string.
See the query reference: developers.google.com/chart/interactive/docs/querylanguage You could split firstname and lastname into separate columns, then only search for firstnames exactly equal to 'Peter'. Though you may want to also check if lowercase/uppercase where lower(B) contains 'peter' or whitespaces are present in unexpected places (e.g., trim()). You could also search only for values that start with Peter by using starts with instead of contains, or a regular expression using matches. – Brian D
It seems that for my case using 'starts with' is a perfect fit. Thank you!

Custom order on string

I have a project model. Projects have a code attribute, which is in AAXXXX-YY format like "AA0001-18", "ZA0012-19", where AA is two characters, XXXX is a progressive number, and YY is the last two digits of the year of its creation.
I need to define a default scope that orders projects by code in a way that the year takes precedence over the other part. Supposing I have the codes "ZZ0001-17", "AA0001-18", and "ZZ002-17", "ZZ001-17" is first, "ZZ002-17" is second, and "AA001-18" is third.
I tried:
default_scope { order(:code) }
but I get "AA001-18" first.
Short answer
order("substring(code from '..$') ASC, code ASC")
Wait but why?
So as you said, you want to basically sort by 2 things:
the last 2 characters in the code string. YY
the rest of the code AAXXXX-
So first things first,
the order function as per Rails documentation will take the arguments you added and use them in the ORDER BY clause of the query.
Then, the substring function according to the documentation of PostgreSQL is:
substring(string from pattern)
If we want 2 characters .. from the end of the string $ we use ..$
Hence, substring(code from '..$')
For more information about pattern matching please refer to the documentation here.
Now finally, with the second part of our ordering the code which already will act as a sorter for all the preceding characters AAXXXX-.

How to compose a query that matches multiple tag values?

I was wondering what the best way is to compose a WHERE clause that matches multiple values for a tag. I was in the impression that i could solve this using a regex pattern but i seem to hit a wall. There is too many data returned in my query…
In my case i have several measurements that have an ‘location_id’ tag.
When i create a query using a where clause like below i get data back that is not correct. Probably to my misunderstanding on how to use the regex pattern or maybe it is impossible…
My data is as follows
time cpu location_id
---- ---- -----------
2017-11-27T07:00:00Z 159 2
2017-11-27T15:00:00Z 154 27
2017-11-27T23:00:00Z 117 7
2017-11-28T07:00:00Z 160 7
2017-11-28T15:00:00Z 167 27
2017-11-28T23:00:00Z 170 27
When i execute a query i only want the locations back with the value of ‘7’.
But when i use a query like below the data from location_id 27 is also returned…
SELECT * FROM “measurement” WHERE location_id =~ /7/;
My goal is that i would like to indicate that the location_id should be in a list of values. Is this even possible with regex? Or should i use AND clauses?
SELECT * FROM “measurement” WHERE location_id =~ /7|2|104|45/;
This is possible with regex (albeit only with tags/fields that are strings). First, recall that the regex /7/ matches the character 7 anywhere in the input text. Therefore both "7" and "27" match.
To constrain the match to cover the entire input text, wrap it in start-of-text ^ and end-of-text $ markers. For example, the regex /^7$/ will match only the string "7" and nothing else.
To match against multiple entire strings, use the regex or operator |. Remember however that it has a lower operator precedence than composition, meaning we have to wrap the subexpression in parentheses. For example, /^(7|2|104|45)$/ will match against either "7", "2", "104", or "45".
See the golang regex syntax documentation for more details.

How do I grab all the content from within [url] including square brackets and match group 1 and 2

I have this regular expression
/\[url=(?:")?(.*?)(?:")?\](.*?)\[\/url\]/mi
and these blocks of text
[url=/someurl?page=5#3467]First[/url][postquote=true]
[url=/another_url/who-is?page=4#3396] Second[/url]
Some text[url=/another_url/who-is?page=3][i]3[/i] Third [/url]
and the regex works great at extracting the urls and text between the urls
Match 1
1. /someurl?page=5#3467
2. First
Match 2
1. /another_url/who-is?page=4#3396
2. Second
Match 3
1. /another_url/who-is?page=3
2. [i]3[/i] Third
The problem happens when I use the same regex from above to try to extract the url from this text
This is some text [url=https://www.somesite.com/location/?opt[]=apples]Link Name[/url]
Match 1
1. https://www.somesite.com/location/?opt[
2. =apples]Link Name
Notice the =apples] in the second match. What I need is the matched first match to include that in the url like
https://www.somesite.com/location/?opt[]=apples
Link Name
I have tried many modifications to this regex and no go so far, any help would be appreciated.
Ruby regex has the duplicate named capture feature. With this feature, you can handle the two cases easily (the one with &quote; and the other). You don't have to use a recursive pattern since I doubt that [] can be nested in the query part of a url:
/\[url=(?:&quote;(?<url>[^&]*(?:&(?!quote;)[^&]*)*)&quote;|(?<url>[^\s\]\[]*(?:\[\][^\s\]\[]*)*))\](?<text>.*?)\[\/url\]/mi
the url is in the named group url and the content between tags is in the named group text.
in a more readable format:
/
\[url=
(?:
&quote; (?<url> [^&]* (?:&(?!quote;)[^&]*)* ) &quote;
|
(?<url> [^\s\]\[]* (?:\[\][^\s\]\[]*)* )
)
\]
(?<text>.*?)\[\/url\]
/mix

How do I repeat a capturing group?

I have an input string that looks something like this:
HLI6Ch60000Ch500C0Ch46400Ch30000Ch21888Ch10E79CS07LCU3Ch37880Ch27800Ch16480CS8CA00000000000000000000
Now I don't care about the part that follows the last letter A, it'll always be A and exactly 20 numbers that are of no use to me. I do, however, need the part before the last letter A, and ideally, I'd need it to be separated into two different captures, just like this:
1: HLI6Ch60000Ch500C0Ch46400Ch30000Ch21888Ch10E79CS07
2: LCU3Ch37880Ch27800Ch16480CS8C
The only way to identify these matches is that they end with characters CS followed by two hexadecimal characters. I thought that a regular expression like (.+?CS.{2})+ (or (.+?CS[[:xdigit:]]{2})+) would do the job but when tried on www.regex101.com, it only captures the last group and gives the following warning:
Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
Which I thought suggests that I should use regular expression like ((.+?CS.{2})+) instead and I mean – sure, now I get two captures, but they look like this:
1: HLI6Ch60000Ch500C0Ch46400Ch30000Ch21888Ch10E79CS07LCU3Ch37880Ch27800Ch16480CS8C
2: LCU3Ch37880Ch27800Ch16480CS8C
Meaning the first one is… slightly longer than I'd like it to be. If it helps in any way, I should point out that the final regular expression will be part of an iOS application so an instance of NSRegularExpression class will be used – not sure if that's a helpful information at all, it's just that I know that NSRegularExpression doesn't support every part of the world of regular expressions.
(.+?CS.{2})
You can direclty use this.See demo.Grab the group or capture.
https://regex101.com/r/vD5iH9/68
It doesn't seem like you need a capturing group at all:
(?:(?!CS[0-9A-F]{2}).)+CS[0-9A-F]{2}
will match all strings that end in CS + 2 hex digits.
Test it live on regex101.com.
Explanation:
(?: # Start a group.
(?!CS[0-9A-F]{2}) # Make sure we can't match CSff here,
. # if so, match any character.
)+ # Do this at least once.
CS[0-9A-F]{2} # Then match CSff.
Change your regex to,
(.+?CS[[:xdigit:]]{2})
DEMO
You don't need to put the regex inside another capturing group and make it to repeat one or more times. Just print the group index 1 to get your desired output.

Resources