How do I repeat a capturing group? - ios

I have an input string that looks something like this:
HLI6Ch60000Ch500C0Ch46400Ch30000Ch21888Ch10E79CS07LCU3Ch37880Ch27800Ch16480CS8CA00000000000000000000
Now I don't care about the part that follows the last letter A, it'll always be A and exactly 20 numbers that are of no use to me. I do, however, need the part before the last letter A, and ideally, I'd need it to be separated into two different captures, just like this:
1: HLI6Ch60000Ch500C0Ch46400Ch30000Ch21888Ch10E79CS07
2: LCU3Ch37880Ch27800Ch16480CS8C
The only way to identify these matches is that they end with characters CS followed by two hexadecimal characters. I thought that a regular expression like (.+?CS.{2})+ (or (.+?CS[[:xdigit:]]{2})+) would do the job but when tried on www.regex101.com, it only captures the last group and gives the following warning:
Note: A repeated capturing group will only capture the last iteration. Put a capturing group around the repeated group to capture all iterations or use a non-capturing group instead if you're not interested in the data
Which I thought suggests that I should use regular expression like ((.+?CS.{2})+) instead and I mean – sure, now I get two captures, but they look like this:
1: HLI6Ch60000Ch500C0Ch46400Ch30000Ch21888Ch10E79CS07LCU3Ch37880Ch27800Ch16480CS8C
2: LCU3Ch37880Ch27800Ch16480CS8C
Meaning the first one is… slightly longer than I'd like it to be. If it helps in any way, I should point out that the final regular expression will be part of an iOS application so an instance of NSRegularExpression class will be used – not sure if that's a helpful information at all, it's just that I know that NSRegularExpression doesn't support every part of the world of regular expressions.

(.+?CS.{2})
You can direclty use this.See demo.Grab the group or capture.
https://regex101.com/r/vD5iH9/68

It doesn't seem like you need a capturing group at all:
(?:(?!CS[0-9A-F]{2}).)+CS[0-9A-F]{2}
will match all strings that end in CS + 2 hex digits.
Test it live on regex101.com.
Explanation:
(?: # Start a group.
(?!CS[0-9A-F]{2}) # Make sure we can't match CSff here,
. # if so, match any character.
)+ # Do this at least once.
CS[0-9A-F]{2} # Then match CSff.

Change your regex to,
(.+?CS[[:xdigit:]]{2})
DEMO
You don't need to put the regex inside another capturing group and make it to repeat one or more times. Just print the group index 1 to get your desired output.

Related

Difficulty applying a Regex to a Rails View. Should I make it a helper method?

I am trying to apply the following regex to one of my views:
^([^\s]+)\s+
This is to remove any string of consecutive non-whitespace characters including any white space characters that follow from the start of the line (remove everything except the first word). I have input it on Rubular and it works.
I was wondering how I would be able to apply it to my rails project. Would I create a rails helper method? So far I have tested it in irb and it is not returning the right value:
I would like to know how I can fix my method and if making it a helper method is the right approach. Thank you very much for your help guys!
The =~ operator matches the regular expression against a string, and it returns either the offset of the match from the string if it is found, otherwise nil.
You could either try it with String.match and work with the match data.
like
str.match(^([^\s]+)\s+)
or you don't use regex for readability. Split the string on spaces and return and array of the words and take the first one, like:
str.split(' ').first

Custom order on string

I have a project model. Projects have a code attribute, which is in AAXXXX-YY format like "AA0001-18", "ZA0012-19", where AA is two characters, XXXX is a progressive number, and YY is the last two digits of the year of its creation.
I need to define a default scope that orders projects by code in a way that the year takes precedence over the other part. Supposing I have the codes "ZZ0001-17", "AA0001-18", and "ZZ002-17", "ZZ001-17" is first, "ZZ002-17" is second, and "AA001-18" is third.
I tried:
default_scope { order(:code) }
but I get "AA001-18" first.
Short answer
order("substring(code from '..$') ASC, code ASC")
Wait but why?
So as you said, you want to basically sort by 2 things:
the last 2 characters in the code string. YY
the rest of the code AAXXXX-
So first things first,
the order function as per Rails documentation will take the arguments you added and use them in the ORDER BY clause of the query.
Then, the substring function according to the documentation of PostgreSQL is:
substring(string from pattern)
If we want 2 characters .. from the end of the string $ we use ..$
Hence, substring(code from '..$')
For more information about pattern matching please refer to the documentation here.
Now finally, with the second part of our ordering the code which already will act as a sorter for all the preceding characters AAXXXX-.

How do I grab all the content from within [url] including square brackets and match group 1 and 2

I have this regular expression
/\[url=(?:")?(.*?)(?:")?\](.*?)\[\/url\]/mi
and these blocks of text
[url=/someurl?page=5#3467]First[/url][postquote=true]
[url=/another_url/who-is?page=4#3396] Second[/url]
Some text[url=/another_url/who-is?page=3][i]3[/i] Third [/url]
and the regex works great at extracting the urls and text between the urls
Match 1
1. /someurl?page=5#3467
2. First
Match 2
1. /another_url/who-is?page=4#3396
2. Second
Match 3
1. /another_url/who-is?page=3
2. [i]3[/i] Third
The problem happens when I use the same regex from above to try to extract the url from this text
This is some text [url=https://www.somesite.com/location/?opt[]=apples]Link Name[/url]
Match 1
1. https://www.somesite.com/location/?opt[
2. =apples]Link Name
Notice the =apples] in the second match. What I need is the matched first match to include that in the url like
https://www.somesite.com/location/?opt[]=apples
Link Name
I have tried many modifications to this regex and no go so far, any help would be appreciated.
Ruby regex has the duplicate named capture feature. With this feature, you can handle the two cases easily (the one with &quote; and the other). You don't have to use a recursive pattern since I doubt that [] can be nested in the query part of a url:
/\[url=(?:&quote;(?<url>[^&]*(?:&(?!quote;)[^&]*)*)&quote;|(?<url>[^\s\]\[]*(?:\[\][^\s\]\[]*)*))\](?<text>.*?)\[\/url\]/mi
the url is in the named group url and the content between tags is in the named group text.
in a more readable format:
/
\[url=
(?:
&quote; (?<url> [^&]* (?:&(?!quote;)[^&]*)* ) &quote;
|
(?<url> [^\s\]\[]* (?:\[\][^\s\]\[]*)* )
)
\]
(?<text>.*?)\[\/url\]
/mix

Rails Regex Match Group Overwriting itself

I am trying to match this string:
NFPA 101 19.7.2.2
and am using this regex:
(NFPA) (\w+)(?: ?(?:([^.]+)\.?)+)?
This seems to match the string, but the captured groups are not what I'm looking for. I expect:
NFPA
101
19
7
2
2
What I get is this:
NFPA
101
2
See this rubular example:
http://rubular.com/r/43VY0yyNa7
It's as if that last recurring capture group is being overwritten by the final match. Is there a way to have all of these come back as capture groups as I need?
Added another regex that gives me the similar problem described above:
(NFPA) (.+) (.+?.)+(.+)
The issue is you're using non-capturing group symbol : which isn't gonna work to select the string as separate capture group. To overcome the issue you need to use Positive / Negative Lookahead. So, the following regex should work in this case :
(\w+|\d+[-]\d+)(?=\s?)(?![-])
see demo

Rails query by number of digits in field

I have a Rails app with a table: "clients". the clients table has a field: phone. phone data type is string. I'm using postgresql. I would like to write a query which selects all clients which have a phone value containing more than 10 digits. phone does not have a specific format:
+1 781-658-2687
+1 (207) 846-3332
2067891111
(345)222-777
123.234.3443
etc.
I've been trying variations of the following:
Client.where("LENGTH(REGEXP_REPLACE(phone,'[^\d]', '')) > 10")
Any help would be great.
You almost have it but you're missing the 'g' option to regexp_replace, from the fine manual:
The regexp_replace function provides substitution of new text for substrings that match POSIX regular expression patterns. [...] The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Flag i specifies case-insensitive matching, while flag g specifies replacement of each matching substring rather than only the first one.
So regexp_replace(string, pattern, replacement) behaves like Ruby's String#sub whereas regexp_replace(string, pattern, replacement, 'g') behaves like Ruby's String#gsub.
You'll also need to get a \d through your double-quoted Ruby string all the way down to PostgreSQL so you'll need to say \\d in your Ruby. Things tend to get messy when everyone wants to use the same escape character.
This should do what you want:
Client.where("LENGTH(REGEXP_REPLACE(phone, '[^\\d]', '', 'g')) > 10")
# --------------------------------------------^^---------^^^
Try this:
phone_number.gsub(/[^\d]/, '').length

Resources