matching an address with regex doesn't match the target part - ruby-on-rails

I'm not quite good in regex.
With my input string LT 1 BLK 4 LAKES OF PARKWAY 5 R/P & AMEND
I'd like to match just the only part between the figure 4 and 5 in the string.
meaning that, my expected result is LAKES OF PARKWAY.
I've tried to come up with a pattern to get such result.
\d+\s+([A-z ]+)(\d+.*?)*$
but with my pattern, it only matches BLK and 5 R/P & AMEND, as group #1 and group #2 respectively. At the end of my thought pattern, I decide to use end of string matching, $.
So, when 5 R/P & AMEND got matched, the pointer should move further behind to the sub sequence part. Then, ([A-z ]+) should match LAKES OF PARKWAY.
What's wrong with my pattern? and how to get it to work?
Any advice would be very much appreciated.

Try \d+\s+(\D+)\d+\D*$
\D means 'anything that is not \d, so it won't be allowed to match, for example, between the first 1 and 4, because then the ending of the regex would be rejected at the later 5.

Related

Extract string values that are enclosed in slashes

An example url that I'm trying to collect the values from has this pattern:
https://int.soccerway.com/matches/2021/08/18/canada/canadian-championship/hfx-wanderers/blainville/3576866/
The searched value always starts at the seventh / and ends at the ninth /:
/canada/canadian-championship/
The method I know how to do is using LEFT + FIND and RIGHT + FIND, but it is very archaic, I believe there is a better method for this need.
Another alternative:
="/"&textjoin("/", 1, query(split(A1, "/"), "Select Col7, Col8"))&"/"
The searched value always starts at the seventh / and ends at the ninth /:
Here's another way you can do it:
="/"&regexextract(A1,"(?:.*?/){7}(.*?/.*?/)")
You can use =REGEXTRACT() to match part of the string with a regular expression:
For example, If A1 = https://int.soccerway.com/matches/2021/08/18/canada/canadian-championship/hfx-wanderers/blainville/3576866/ ,
then
=REGEXEXTRACT(A1, "\/[^\/]*\/[^\/]*\/[^\/]*\/[^\/]*\/[^\/]*\/[^\/]*(\/[^\/]*\/[^\/]*\/)")
returns
/canada/canadian-championship/
Explanation: \/ is '/' escaped. [^\/]* matches any non '/' character 0 or more times. \/[^\/]* is repeated 6 times. () captures a specific part of the string as a group to be returned. Finally (\/[^\/]*\/[^\/]*\/) matches the essential part we want.
Little bit different approach.
=REGEXEXTRACT(SUBSTITUTE(SUBSTITUTE(A1,"/","|",9),"/","|",7),"\|(.*?)\|")

How to remove from string before __

I am building a Rails 5.2 app.
In this app I got outputs from different suppliers (I am building a webshop).
The name of the shipping provider is in this format:
dhl_freight__233433
It could also be in this format:
postal__US-320202
How can I remove all that is before (and including) the __ so all that remains are the things after the ___ like for example 233433.
Perhaps some sort of RegEx.
A very simple approach would be to use String#split and then pick the second part that is the last part in this example:
"dhl_freight__233433".split('__').last
#=> "233433"
"postal__US-320202".split('__').last
#=> "US-320202"
You can use a very simple Regexp and a ask the resulting MatchData for the post_match part:
p "dhl_freight__233433".match(/__/).post_match
# another (magic) way to acces the post_match part:
p $'
Postscript: Learnt something from this question myself: you don't even have to use a RegExp for this to work. Just "asddfg__qwer".match("__").post_match does the trick (it does the conversion to regexp for you)
r = /[^_]+\z/
"dhl_freight__233433"[r] #=> "233433"
"postal__US-320202"[r] #=> "US-320202"
The regular expression matches one or more characters other than an underscore, followed by the end of the string (\z). The ^ at the beginning of the character class reads, "other than any of the characters that follow".
See String#[].
This assumes that the last underscore is preceded by an underscore. If the last underscore is not preceded by an underscore, in which case there should be no match, add a positive lookbehind:
r = /(?<=__[^_]+\z/
This requires the match to be preceded by two underscores.
There are many ruby ways to extract numbers from string. I hope you're trying to fetch numbers out of a string. Here are some of the ways to do so.
Ref- http://www.ruby-forum.com/topic/125709
line.delete("^0-9")
line.scan(/\d/).join('')
line.tr("^0-9", '')
In the above delete is the fastest to trim numbers out of strings.
All of above extracts numbers from string and joins them. If a string is like this "String-with-67829___numbers-09764" outut would be like this "6782909764"
In case if you want the numbers split like this ["67829", "09764"]
line.split(/[^\d]/).reject { |c| c.empty? }
Hope these answers help you! Happy coding :-)

How can I catch a US phone number with +1 area code regex

I have the following requirement.
I need to validate with Rails that a phone number begins with +1 and is met with exactly 10 digits after this? So far, I have this regex expression.
^+1\d{10}
This is not working and I'm having a bit of trouble trying to tweak this to match exactly what I need. Does anyone have any ideas the validation has to catch this exactly.
+19564321234
etc. Help would be appreciated. Thank you.
You have two problems with your regular expression:
You have not escaped the plus sign, so it reads, "match the beginning of a line, followed by one or more (+) zero-width characters, followed by 10 digits"
You have omitted an end-of-string anchor, so \d{10} will match a string of 10 or more digits
You need to write:
r = /\A\+\d{10}\z/
"+1234567890".match? r #=> true
"1234567890".match? r #=> false
"+1234567890123".match? r #=> false
I've used the beginning-of-string anchor, \A, rather than the beginning-of-line anchor ^. The latter would be in problem only in some extreme cases, such as:
"dog\n+1234567890".match? /^\+\d{10}\z/ #=> true
Start your engine!. To see it tested against several strings one must change the anchors to ^ and $ and add the multiline (/m) modifier.

lua pattern matching: delimited captures

I am trying to parse a string such as: &1 first &2 second &4 fourth \\, and from it to build a table
t = {1=first, 2=second, 4=fourth}
I'm not very experienced with regex in general so my naive try (disregarding the \\ and table parts for the moment) was
local s = [[&1 first &2 second &4 fourth \\]]
for k,v in string.gmatch(s, "&(%d+)(.-)&") do
print("k = "..k..", v = "..v)
end
which gives only the first captured pair when I was expecting to see two captured pairs. I've done some reading and found the lpeg library, but it's massively unfamiliar to me. Is lpeg needed here? Could anyone explain my error?
&(%d+)(.-)& matches &1 first &
Leaving 2 second &4 fourth \\ to be matched on
Your pattern does not match any further items
If you know that the values are one word, this should work:
string.gmatch(s, "&(%d+)%s+([^%s&]+)")
Take "&", followed by 1 or more digits (captured), followed by one or more space and then one or more non-space, non-& characters (captured).

How to find all instances of #[XX:XXXX] in a string and then find the surrounding text?

Given a string like:
"#[19:Sara Mas] what's the latest with the TPS report? #[30:Larry Peters] can you help out here?"
I want to find a way to dynamically return, the user tagged and the content surrounding. Results should be:
user_id: 19
copy: what's the latest with the TPS report?
user_id: 30
copy: can you help out here?
Any ideas on how this can be done with ruby/rails? Thanks
How is this regex for finding matches?
#\[\d+:\w+\s\w+\]
Split the string, then handle the content iteratively. I don't think it'd take more than:
tmp = string.split('#').map {|str| [str[/\[(\d*).*/,1], str[/\](.*^)/,1]] }
tmp.first #=> ["19", "what's the latest with the TPS report?"]
Does that help?
result = subject.scan(/\[(\d+).*?\](.*?)(?=#|\Z)/m)
This grabs id and content in backreferences 1 and 2 respectively. For stoping the capture either # or the end of string must be met.
"
\\[ # Match the character “[” literally
( # Match the regular expression below and capture its match into backreference number 1
\\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\] # Match the character “]” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
\# # Match the character “\#” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"
This will match something starting from # and ending to punctuation makr. Sorry if I didn't understand correctly.
result = subject.scan(/#.*?[.?!]/)

Resources