Rails string split every other "." - ruby-on-rails

I have a bunch of sentences that I want to break into an array. Right now, I'm splitting every time \n appears in the string.
#chapters = #script.split('\n')
What I'd like to do is .split ever OTHER "." in the string. Is that possible in Ruby?

You could do it with a regex, but I'd start with a simple approach: just split on periods, then join pairs of substrings:
s = "foo. bar foo. foo bar. boo far baz. bizzle"
s.split(".").each_slice(2).map {|p| p.join "." }
# => => ["foo. bar foo", " foo bar. boo far baz", " bizzle"]

This is a case where it's easier to use String#scan than String#split.
We can use the following regular expression:
r = /(?<=\.|\A)[^.]*\.[^.]*(?=\.|\z)/
str=<<~_
Now is the time. This is it. It is now. The time to have fun.
The time to make new friends. The time to party.
_
str.scan(r)
#=> [
# "Now is the time. This is it",
# " It is now. The time to have fun",
# "\nThe time to make new friends. The time to party"
#=> ]
We can write the regular expression in free-spacing mode to make it self-documenting.
r = /
(?<= # begin a positive lookbehind
\A # match the beginning of the string
| # or
\. # match a period
) # end positive lookbehind
[^.]* # match zero or more characters other than periods
\. # match a period
[^.]* # match zero or more characters other than periods
(?= # begin a positive lookahead
\. # match a period
| # or
\z # match the end of the string
) # end positive lookahead
/x # invoke free-spacing regex definition mode
Note that (?<=\.|\A) can be replaced with (?<![^\.]). (?<![^\.]) is a negative lookbehind that asserts the match is not preceded by a character other than a period.
Similarly, (?=\.|\z) can be replaced with (?![^.]). (?![^.]) is a negative lookahead that asserts the match is not followed by a character other than a period.

Related

How do I replace all the apostrophes that come right before or right after a comma?

I have a string aString = "old_tag1,old_tag2,'new_tag1','new_tag2'"
I want to replace the apostrophees that come right before or right after a comma. For example in my case the apostrophees enclosing new_tag1 and new_tag2 should be removed.
This is what I have right now
aString = aString.gsub("'", "")
This is however problematic as it removes any apostrophe inside for example if I had 'my_tag's' instead of 'new_tag1'. How do I get rid of only the apostrophes that come before or after the commas ?
My desired output is
aString = "old_tag1,old_tag2,new_tag1,new_tag2"
My guess is to use regex as well, but in a slightly other way:
aString = "old_tag1,old_tag2,'new_tag1','new_tag2','new_tag3','new_tag4's'"
aString.gsub /(?<=^|,)'(.*?)'(?=,|$)/, '\1\2\3'
#=> "old_tag1,old_tag2,new_tag1,new_tag2,new_tag3,new_tag4's"
The idea is to find a substring with bounding apostrophes and paste it back without it.
regex = /
(?<=^|,) # watch for start of the line or comma before
' # find an apostrophe
(.*?) # get everything between apostrophes in a non-greedy way
' # find a closing apostrophe
(?=,|$) # watch after for the comma or the end of the string
/x
The replacement part just paste back the content of the first, second, and third groups (everything between parenthesis).
Thanks for #Cary for /x modificator for regexes, I didn't know about it! Extremely useful for explanation.
This answers the question, "I want to replace the apostrophes that come right before or right after a comma".
r = /
(?<=,) # match a comma in a positive lookbehind
\' # match an apostrophe
| # or
\' # match an apostrophe
(?=,) # match a comma in a positive lookahead
/x # free-spacing regex definition mode
aString = "old_tag1,x'old_tag2'x,x'old_tag3','new_tag1','new_tag2'"
aString.gsub(r, '')
#=> => "old_tag1,x'old_tag2'x,x'old_tag3,new_tag1,new_tag2'"
If the objective is instead to remove single quotes enclosing a substring when the left quote is at the the beginning of the string or is immediately preceded by a comma and the right quote is at the end of the string or is immediately followed by comma, several approaches are possible. One is to use a single, modified regex, as #Dimitry has done. Another is to split the string on commas, process each string in the resulting array and them join the modified substrings, separated by commas.
r = /
\A # match beginning of string
\' # match single quote
.* # match zero or more characters
\' # match single quote
\z # match end of string
/x # free-spacing regex definition mode
aString.split(',').map { |s| (s =~ r) ? s[1..-2] : s }.join(',')
#=> "old_tag1,x'old_tag2'x,x'old_tag3',new_tag1,new_tag2"
Note:
arr = aString.split(',')
#=> ["old_tag1", "x'old_tag2'x", "x'old_tag3'", "'new_tag1'", "'new_tag2'"]
"old_tag1" =~ r #=> nil
"x'old_tag2'x" =~ r #=> nil
"x'old_tag3'" =~ r #=> nil
"'new_tag1'" =~ r #=> 0
"'new_tag2'" =~ r #=> 0
Non regex replacement
Regular expressions can get really ugly. There is a simple way to do it with just string replacement: search for the pattern ,' and ', and replace with ,
aString.gsub(",'", ",").gsub("',", ",")
=> "old_tag1,old_tag2,new_tag1,new_tag2'"
This leaves the trailing ', but that is easy to remove with .chomp("'"). A leading ' can be removed with a simple regex .gsub(/^'/, "")

Ruby Regex match a newline followed by anything but (3 uppercase characters followed by a pipe)

(Hopefully) simple regex question here. I'm looking to match 1 or more newlines that aren't followed by a certain pattern of three uppercase characters and a pipe (|), and remove them.
For an example,I'm looking to turn this:
foo bar foo bar.
Normal
0
false
false
false
EN-US
JA
X-NONE
foo bar foo bar
|||||HH
OBX|156|TX|foo bar|||N
OBX|157|TX|foo bar
Into this:
foo bar foo bar. Normal 0 false false false EN-US JA X-NONE|||||HH
OBX|156|TX|foo bar|||N
OBX|157|TX|foo bar
I have the regex that works great in Sublime here:
(\n+)(?!MSH|PID|NTE|PV1|RXO|ORC|DG1|OBR|OBX).*
But In ruby, it's not getting rid of the newlines. Is there anything I'm missing when converting the sublime regex into regex for rails?
#r.force_encoding("UTF-8").gsub("\r\n","\r").gsub("(\r+)(?!MSH|PID|NTE|PV1|RXO|ORC|DG1|OBR|OBX)(.*)"," $2")
str = <<-MULTI
foo bar foo bar.
Normal
0
false
false
false
EN-US
JA
X-NONE
foo bar foo bar
|||||HH
OBX|156|TX|foo bar|||N
OBX|157|TX|foo bar
MULTI
str.gsub(/(\n+)(?!MSH|PID|NTE|PV1|RXO|ORC|DG1|OBR|OBX).*/,'')
# It gives your desired result
My solution would be to handle the lines individually, multi-line regex can be quite confusing for many people.
.each_line or .lines both return the individual lines.
.grep will match an array against a regular expression or string based pattern.
.join will take the individual lines and return a single multiline string from the results.
str.each_line
.grep( /^[A-Z]{3,3}\|.+/ )
.join( '' )
As far as the regex, lets break that down too, now that we are only dealing with things line by line:
^ - Starting at the beginning of the line.
[A-Z] - Only match the range of chars from 'A' to 'Z' ( all cap chars ).
{3, 3} - Match only 3 chars, no more, no less.
\| - Followed by a '|' char.
.+ - Followed by 1+ chars of anything.
If str is your string,
r = /
\n+ # match one or more newlines
(?! # start a negative lookahead
#{Regexp.union(keepers)} # match one of keepers
\| # match pipe--escape required
) # close negative lookahead
/x # extended/free-spacing regex definition mode
#=> /
\n+
(?!
(?-mix:MSH|PID|NTE|PV1|RXO|ORC|DG1|OBR|OBX)
\|
)
/x
keepers = %w[ MSH PID NTE PV1 RXO ORC DG1 OBR OBX ]
#=> ["MSH", "PID", "NTE", "PV1", "RXO", "ORC", "DG1", "OBR", "OBX"]
puts str.gsub(r, "")
# foo bar foo bar.Normal0falsefalsefalseEN-USJAX-NONEfoo bar foo bar|||||HH
# OBX|156|TX|foo bar|||N
# OBX|157|TX|foo bar

Ruby regular expression for version numbers

I want to write a program which takes build number in the format of 23.0.23.345 (first two-digits then dot, then zero, then dot, then two-digits, dot, three-digits):
number=23.0.23.345
pattern = /(^[0-9]+\.{0}\.[0-9]+\.[0-9]$)/
numbers.each do |number|
if number.match pattern
puts "#{number} matches"
else
puts "#{number} does not match"
end
end
Output:
I am getting error:
floating literal anymore put zero before dot
I'd use something like this to find patterns that match:
number = 'foo 1.2.3.4 23.0.23.345 bar'
build_number = number[/
\d{2} # two digits
\.
0
\.
\d{2} # two more digits
\.
\d{3}
/x]
build_number # => "23.0.23.345"
This example is using String's [/regex/] method, which is a nice shorthand way to apply and return the result of a regex. It returns the first match only in the form I'm using. Read the documentation for more information and examples.
Your pattern won't work because it doesn't do what you think it does. Here's how I'd read it:
/( # group
^ # start of line
[0-9]+ # one or more digits
\.{0} # *NO* dots
\. # one dot
[0-9]+ # one or more digits
\. # one dot
[0-9] # one digit
$ # end of line
)/x
The problem is \.{0} which means you don't want any dots.
The x flag tells Ruby to use multiline, which ignores blanks/whitespace and comments, making it easy to build a pattern that is documented.
Why reinvent the wheel? Use a gem like versionomy. You can parse the versions, compare them, check for equality, increment a particular part, etc. It even handles alpha, beta, patchlevels, etc.
require 'versionomy'
number='23.0.23.345'
v = Versionomy.parse number
v.major #=> 23
v.minor #=> 0
v.tiny #=> 23
v.tiny2 #=> 345
numbers = "23.0.23.345", "23.0.33.173", "0.0.0.0"
pattern = /\d{2}\.0\.\d{2}\.\d{3}/x
numbers.each do |number|
if number.match pattern
puts "#{number} matches"
else
puts "#{number} does not match"
end
end
The "number" array in line one needs to have values of strings and not integers, I also changed the array "number" to "numbers", you will also need multiple items in the numbers array to call the ".each" method in your loop.
There seems to be agreement on what regular expression you should use. If your ultimate goal is to extract the elements of the strings as integers, you could do this:
str = "I'm looking for 23.0.345.26, or was that 23.0.26.345?"
str.scan(/(\d{2})\.(0)\.(\d{2})\.(\d{3})/).flatten.map(&:to_i)
#=> [23, 0, 26, 345]

Why can't regular expressions match for # sign?

For the string Be there # six.
Why does this work:
str.gsub! /\bsix\b/i, "seven"
But trying to replace the # sign doesn't match:
str.gsub! /\b#\b/i, "at"
Escaping it doesn't seem to work either:
str.gsub! /\b\#\b/i, "at"
This is down to how \b is interpreted. \b is a "word boundary", wherein a zero-length match occurs if \b is preceded by or followed by a word character. The word characters are limited to [A-Za-z0-9_] and maybe a few other things, but # is not a word character, so \b won't match just before it (and after a space). The space itself is not the boundary.
More about word boundaries...
If you need to replace the # with surrounding whitespace, you can capture it after the \b and use backreferences. This captures preceding whitespace with \s* for zero or more space characters.
str.gsub! /\b(\s*)#(\s*)\b/i, "\\1at\\2"
=> "Be there at six"
Or to insist upon whitespace, use \s+ instead of \s*.
str = "Be there # six."
str.gsub! /\b(\s+)#(\s+)\b/i, "\\1at\\2"
=> "Be there at six."
# No match without whitespace...
str = "Be there#six."
str.gsub! /\b(\s+)#(\s+)\b/i, "\\1at\\2"
=> nil
At this point, we're starting to introduce redundancies by forcing the use of \b. It could just as easily by done with /(\w+\s+)#(\s+\w+)/, foregoing the \b match for \w word characters followed by \s whitespace.
Update after comments:
If you want to treat # like a "word" which may appear at the beginning or end, or inside bounded by whitespace, you may use \W to match "non-word" characters, combined with ^$ anchors with an "or" pipe |:
# Replace # at the start, middle, before punctuation
str = "# Be there # six #."
str.gsub! /(^|\W+)#(\W+|$)/, '\\1at\\2'
=> "at Be there at six at."
(^|\W+) matches either ^ the start of the string, or a sequence of non-word characters (like whitespace or punctuation). (\W+|$) is similar but can match the end of the string $.
\b matches a word boundary, which is where a word character is next to a non-word character. In your string the # has a space on each side, and neither # or space are word characters so there is no match.
Compare:
'be there # six'.gsub /\b#\b/, 'at'
produces
'be there # six'
(i.e. no changes)
but
'be there#six'.gsub /\b#\b/, 'at' # no spaces around #
produces
"be thereatsix"
Also
'be there # six'.gsub /#/, 'at' # no word boundaries in regex
produces
"be there at six"

How to find all instances of #[XX:XXXX] in a string and then find the surrounding text?

Given a string like:
"#[19:Sara Mas] what's the latest with the TPS report? #[30:Larry Peters] can you help out here?"
I want to find a way to dynamically return, the user tagged and the content surrounding. Results should be:
user_id: 19
copy: what's the latest with the TPS report?
user_id: 30
copy: can you help out here?
Any ideas on how this can be done with ruby/rails? Thanks
How is this regex for finding matches?
#\[\d+:\w+\s\w+\]
Split the string, then handle the content iteratively. I don't think it'd take more than:
tmp = string.split('#').map {|str| [str[/\[(\d*).*/,1], str[/\](.*^)/,1]] }
tmp.first #=> ["19", "what's the latest with the TPS report?"]
Does that help?
result = subject.scan(/\[(\d+).*?\](.*?)(?=#|\Z)/m)
This grabs id and content in backreferences 1 and 2 respectively. For stoping the capture either # or the end of string must be met.
"
\\[ # Match the character “[” literally
( # Match the regular expression below and capture its match into backreference number 1
\\d # Match a single digit 0..9
+ # Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
\\] # Match the character “]” literally
( # Match the regular expression below and capture its match into backreference number 2
. # Match any single character that is not a line break character
*? # Between zero and unlimited times, as few times as possible, expanding as needed (lazy)
)
(?= # Assert that the regex below can be matched, starting at this position (positive lookahead)
# Match either the regular expression below (attempting the next alternative only if this one fails)
\# # Match the character “\#” literally
| # Or match regular expression number 2 below (the entire group fails if this one fails to match)
\$ # Assert position at the end of the string (or before the line break at the end of the string, if any)
)
"
This will match something starting from # and ending to punctuation makr. Sorry if I didn't understand correctly.
result = subject.scan(/#.*?[.?!]/)

Resources