Replace each pattern in regexp - ruby-on-rails

I'm having some trouble to find the right pattern to get the string I want.
My starting string is :
,,,,C3:,D3,E3,F3,,
I would like to have
C3: [D3,E3,F3]
I would like to replace each starting commas by double space
Replace coma after colon by double space and left square bracket
Replace trailing commas by right square bracket
For now, I tried this :
> a = ",,,,C3:,D3,E3,F3,,"
=> ",,,,C3:,D3,E3,F3,,"
> b = a.gsub(/^,*/, " ").gsub(/(?<=:),/, " [").gsub(/[,]*$/,"" ).gsub(/[ ]*$/, "]")
=> " C3: [D3,E3,F3]"
> b == " C3: [D3,E3,F3]"
=> false
I can't reach to replace each starting comma by a double space to obtain 8 spaces in this case.
Could you help me to find the right regexp and if possible to improve my code, please ?

To replace each starting comma with a double space, you need to use \G operator, i.e. .gsub(/\G,/, ' '). That operator tells the regex engine to match at the start of the string and then after each successful match. So, you only replace each consecutive comma in the beginning of the string with .gsub(/\G,/, ' ').
Then, you can add other replacements:
s.gsub(/\G,/, ' ').sub(/,+\z/, ']').sub(/:,+/, ': [')
See the IDEONE demo
s = ",,,,C3:,D3,E3,F3,,"
puts s.gsub(/\G,/, ' ').sub(/,+\z/, ']').sub(/:,+/, ': [')
Output:
C3: [D3,E3,F3]

To construct the desired string, one needs to know:
the number of leading commas (the size of the string comprised of the leading commas)
the string following the leading commas up to and including the colon
the string between the comma following the colon and two or more commas
It is a simple matter to construct a regex that saves each of these three strings to a capture group:
r = /
(,*) # match leading commas in capture group 1
(.+:) # match up and including colon in capture group 2
, # match comma
(.+) # match any number of any characters in capture group 3
,, # match two commas
/x # extended/free-spacing regex definition mode
",,,,C3:,D3,E3,F3,," =~ r
We can now form the desired string from the contents of the three capture groups:
"#{' '*$1.size}#{$2} [#{$3}]"
#=> " C3: [D3,E3,F3]"

Related

How to find last occurrence of a substring in a given string?

I have a string, which describe some word, I must change ending of it to "sd", if ending == "jk".
For an example, I have word: "lazerjk", I need to get from it "lazersd".
I tried to use method .gsub!, but it doesn't work correctly if we have more than one occurrence of substring "jk" in a word.
String#rindex returns the index of the last occurrence of the given substring
String#[]= can take two integers arguments, first is index where start to replace and second - length of replaced string
You can use them this way:
replaced = "foo"
replacing = "booo"
string = "foo bar foo baz"
string[string.rindex(replaced), replaced.size] = replacing
string
# => "foo bar booo baz"
"jughjkjkjk\njk".sub(/jk$\z/, 'sd')
=> "jughjkjkjk\nsd"
without $ is probably sufficient.
It sounds like you're looking to replace a specific suffix only. If so, I would probably suggest using sub along with an anchored regex (to check for the desired characters only at the end of the string):
string_1 = "lazerjk"
string_2 = "lazerjk\njk"
string_3 = "lazerjkr"
string_1.sub(/jk\z/, "sd")
#=> "lazersd"
string_2.sub(/jk\z/, "sd")
#=> "lazerjk\nsd"
string_3.sub(/jk\z/, "sd")
#=> "lazerjkr"
Or, you could do without a regex at all by using the reverse! method along with a simple conditional statement to sub! only when the suffix is present:
string = "lazerjk"
old_suffix = "jk"
new_suffix = "sd"
string.reverse!.sub!(old_suffix.reverse, new_suffix.reverse).reverse! if string.end_with? (old_suffix)
string
#=> "lazersd"
OR, you could even use a completely different approach. Here's an example using chomp to remove the unwanted suffix and then ljust to pad the desired suffix to the modified string.
string = "lazerjk"
string.chomp("jk").ljust(string.length, "sd")
#=> "lazersd"
Note that the new suffix only gets added if the length of the string was modified with the initial chomp. Otherwise, the string remains unchanged.
If the goal is to substitute the LAST OCCURRENCE (as opposed to suffix only), then this could be accomplished by using sub along with reverse:
string = "jklazerjkm"
old_substring = "jk"
new_substring = "sd"
string.reverse.sub(old_substring.reverse, new_substring.reverse).reverse
#=> "jklazersdm"
Replacing "jk" at the end of a string with something else is straightforward and can be addressed without concern for other instances of "jk" that may be in the string, so I assume that is not what is being asked. Rather, I assume the problem is to replace the last instance of "jk" in a string with "sd".
Here are two solutions that make use of String#sub with a regular expression.
Use a negative lookahead
The idea here is to match "jk" provided it is not followed later in the string by another instance of "jk".
"lajkz\nejkrjklm".sub(/jk(?!.*jk)/m, "sd")
#=> "lajkz\nejkrsdlm"
Capture the part of the string that precedes the last "jk"
The match, if there is one, consists of the front of the string followed by the last "jk", which is replaced by the captured string followed by "sd".
"lajkz\nejkrjklm".sub(/\A(.*)jk/m) { $1 + "sd" }
#=> "lajkz\nejkrsdlm"
The two regular expressions can be written in free-spacing mode to make them self-documenting. The first is the following.
/
jk # match literal
(?! # begin a negative lookahead
.* # match zero or more characters other than line terminators
jk # match literal
) # end negative lookahead
/mx # invoke multiline and free-spacing regex definition modes.
Multiline mode causes . to match any character, including a line terminator.
The second regular expression can be written as follows.
\A # match the beginning of the string
(.*) # match zero or more characters other than line terminators
# and save the match to capture group 1
jk # match literal
/mx # invoke multiline and free-spacing regex definition modes.
Note that in both expressions .* is greedy, meaning that it will match as many characters as possible, including "jk" so long as other requirements of the expression are met, here that the last instance of "jk" in the string is matched.
Here is a different solution:
str = "jughjkjkjk\njk"
pattern = "jk"
replace_with = "sd"
str = str.reverse.sub(pattern.reverse, replace_with.reverse).reverse

Google Sheets SUBSTITUTE formula for creating an image path

I'm using the following ARRAYFORMULA to create an image path:
=ARRAYFORMULA(
if(row(A:A)=1,"#Icon",IF(
B:B="",,SUBSTITUTE(
"../../../../../../_Assets/Icons/"& LOWER(B:B&".png"), " ", "_")
)
)
)
What it does
Adding a path before the text and replaces all spaces with an underscore '_'. Here is an example:
Name
#icon
A Tit(l)e
../../../../../../_Assets/Icons/a_tit(l)e.png
Title - Subtitle
../../../../../../_Assets/Icons/title _-_subtitle.png
Title text/string - Subtitle
../../../../../../_Assets/Icons/title_text/string _-_subtitle.png
What I want it to do
If possible, I would like to achieve the following:
Avoiding/removing characters in the list below like the forward slash / with an underscore _ (see the last row in my example above)
It allready replaces all white spaces with an underscore _ which is good. But when it sees a whitespace followed by a - and another whitespace it will output _-_ but then I want only a -
So the current table above would output the following instead:
Name
#icon
A Tit(l)e
../../../../../../_Assets/Icons/a_tit(l)e.png
Title - Subtitle
../../../../../../_Assets/Icons/title-subtitle.png
Title text/string - Subtitle
../../../../../../_Assets/Icons/title_text_string-subtitle.png
List of characters to be avoided/replaced with an underscore _:
# pound
% percent
& ampersand
{ left curly bracket
} right curly bracket
\ back slash
< left angle bracket
> right angle bracket
* asterisk
? question mark
/ forward slash
blank spaces
$ dollar sign
! exclamation point
' single quotes
" double quotes
: colon
# at sign
+ plus sign
` backtick
| pipe
= equal sign
Any help/suggestion would be much appreciated!
Put list of avoided chars into column and use REGEXREPLACE:
=ARRAYFORMULA(if(row(A:A)=1,"#Icon",IF(A:A="",,"../../../../../../_Assets/Icons/"&LOWER(REGEXREPLACE(REGEXREPLACE(A:A," - ","-"),TEXTJOIN("|\",0,D2:D23),"_")) & ".png")))
try:
=ARRAYFORMULA({"#Icon",
IF(B2:B="",,SUBSTITUTE(SUBSTITUTE(
"../../../../../../_Assets/Icons/"&LOWER(B2:B&".png"), " ", "_"), "_-_", "-", 1))})

How do I replace all the apostrophes that come right before or right after a comma?

I have a string aString = "old_tag1,old_tag2,'new_tag1','new_tag2'"
I want to replace the apostrophees that come right before or right after a comma. For example in my case the apostrophees enclosing new_tag1 and new_tag2 should be removed.
This is what I have right now
aString = aString.gsub("'", "")
This is however problematic as it removes any apostrophe inside for example if I had 'my_tag's' instead of 'new_tag1'. How do I get rid of only the apostrophes that come before or after the commas ?
My desired output is
aString = "old_tag1,old_tag2,new_tag1,new_tag2"
My guess is to use regex as well, but in a slightly other way:
aString = "old_tag1,old_tag2,'new_tag1','new_tag2','new_tag3','new_tag4's'"
aString.gsub /(?<=^|,)'(.*?)'(?=,|$)/, '\1\2\3'
#=> "old_tag1,old_tag2,new_tag1,new_tag2,new_tag3,new_tag4's"
The idea is to find a substring with bounding apostrophes and paste it back without it.
regex = /
(?<=^|,) # watch for start of the line or comma before
' # find an apostrophe
(.*?) # get everything between apostrophes in a non-greedy way
' # find a closing apostrophe
(?=,|$) # watch after for the comma or the end of the string
/x
The replacement part just paste back the content of the first, second, and third groups (everything between parenthesis).
Thanks for #Cary for /x modificator for regexes, I didn't know about it! Extremely useful for explanation.
This answers the question, "I want to replace the apostrophes that come right before or right after a comma".
r = /
(?<=,) # match a comma in a positive lookbehind
\' # match an apostrophe
| # or
\' # match an apostrophe
(?=,) # match a comma in a positive lookahead
/x # free-spacing regex definition mode
aString = "old_tag1,x'old_tag2'x,x'old_tag3','new_tag1','new_tag2'"
aString.gsub(r, '')
#=> => "old_tag1,x'old_tag2'x,x'old_tag3,new_tag1,new_tag2'"
If the objective is instead to remove single quotes enclosing a substring when the left quote is at the the beginning of the string or is immediately preceded by a comma and the right quote is at the end of the string or is immediately followed by comma, several approaches are possible. One is to use a single, modified regex, as #Dimitry has done. Another is to split the string on commas, process each string in the resulting array and them join the modified substrings, separated by commas.
r = /
\A # match beginning of string
\' # match single quote
.* # match zero or more characters
\' # match single quote
\z # match end of string
/x # free-spacing regex definition mode
aString.split(',').map { |s| (s =~ r) ? s[1..-2] : s }.join(',')
#=> "old_tag1,x'old_tag2'x,x'old_tag3',new_tag1,new_tag2"
Note:
arr = aString.split(',')
#=> ["old_tag1", "x'old_tag2'x", "x'old_tag3'", "'new_tag1'", "'new_tag2'"]
"old_tag1" =~ r #=> nil
"x'old_tag2'x" =~ r #=> nil
"x'old_tag3'" =~ r #=> nil
"'new_tag1'" =~ r #=> 0
"'new_tag2'" =~ r #=> 0
Non regex replacement
Regular expressions can get really ugly. There is a simple way to do it with just string replacement: search for the pattern ,' and ', and replace with ,
aString.gsub(",'", ",").gsub("',", ",")
=> "old_tag1,old_tag2,new_tag1,new_tag2'"
This leaves the trailing ', but that is easy to remove with .chomp("'"). A leading ' can be removed with a simple regex .gsub(/^'/, "")

Capture group in Lua pattern matches literal digit character instead of capture group

I want to extract the VALUE of lines containing key="VALUE", and I am trying to use a simple Lua pattern to solve this.
It works for lines except for those which contains a literal 1 in the VALUE. It seems the pattern parser is confusing my capture group for an escape sequence.
> return string.find('... key = "PHONE2" ...', 'key%s*=%s*(["\'])([^%1]-)%1')
5 18 " PHONE2
> return string.find('... key = "PHONE1" ...', 'key%s*=%s*(["\'])([^%1]-)%1')
nil
>
You do not need to use the [^%1] at all. Just use .- as it, by definition, matches the smallest possible string.
Also, you can use multiline string syntax, to not have to escape the quotes in your pattern:
> s=[[... key = "PHONE1" ...]]
> return s:find [[key%s*=%s*(["'])(.-)%1]]
5 18 " PHONE1
The pattern [^%1] actually means, do not search for characters % and 1 individually.

Pattern match dropping new lines characters

How to extract the values from a csv like string dropping the new lines characters (\r\n or \n) with a pattern.
A line looks like:
1.1;2.2;Example, 3
Notice there are only 3 values and the separator is ;. The problem I'm having is to come up with a pattern that reads the values while dropping the new line characters (the file comes from a windows machine so it has \r\n, reading it from a linux and would like to be independent from the new line character used).
My simple example right now is:
s = "1.1;2.2;Example, 3\r\n";
p = "(.-);(.-);(.-)";
a, b, c = string.match(s, p);
print(c:byte(1, -1));
The two last characters printed by the code above are the \r\n.
The problem is that both, \r and \n are detected by the %c and %s classes (control characters and space characters), as show by this code:
s = "a\r";
print(s:match("%c"));
print(s:match("%s"));
print(s:match("%d"));
So, is it possible to left out from the match the new lines characters? (It should not be assumed that the last two characters will be new lines characters)
The 3ยบ value may contain spaces, punctuation and alphanumeric characters and since \r\n are detected as space characters a pattern like `"(.-);(.-);([%w%s%c]-).*" does not work.
Your pattern
p = "(.-);(.-);(.-)";
does not work: the third field is always empty because .- matches a little as possible. You need to anchor it at the end of the string, but then the third field will contain trailing newline chars:
p = "(.-);(.-);(.-)$";
So, just stop at the first trailing newline char. This also anchors the last match. Try this pattern instead:
p = "(.-);(.-);(.-)[\r\n]";
If trailing newline chars are optional, try this pattern:
p = "(.-);(.-);(.-)[\r\n]*$";
Without any lua experience I found a naive solution:
clean_CR = s:gsub("\r","");
clean_NL = clean_CR:gsub("\n","");
With POSIX regex syntax I'd use
^([^;]*);([^;]*);([^\n\r]*).*$
.. with "\n" and "\r" possibly included as "^M", "^#" (control/unicode characters) .. depending on your editor.

Resources