Regex - Parse string by removing characters - ruby-on-rails

I am trying to parse the path part of a url.
The input, is a string such as site/whatever% ^&*/page/to-days_date// which I would like to convert into site/whatever/page/to-days_date
Things to remove would be anything that is not one of the following:
lower or upper case letter
digit / number
dash
underscore

Just add /+$ with a pipe(|) with your existing regex. It means match any number(starting from 1) of / from the end of input. So it will work for / // or ///// at the end of the input.
myString = '''blog/whatever% ^&*/page/to-days_date//'''
print re.sub(r'/+$|[^a-zA-Z0-9_\-\/]+', '', myString)
^^^ here

Related

How to find last occurrence of a substring in a given string?

I have a string, which describe some word, I must change ending of it to "sd", if ending == "jk".
For an example, I have word: "lazerjk", I need to get from it "lazersd".
I tried to use method .gsub!, but it doesn't work correctly if we have more than one occurrence of substring "jk" in a word.
String#rindex returns the index of the last occurrence of the given substring
String#[]= can take two integers arguments, first is index where start to replace and second - length of replaced string
You can use them this way:
replaced = "foo"
replacing = "booo"
string = "foo bar foo baz"
string[string.rindex(replaced), replaced.size] = replacing
string
# => "foo bar booo baz"
"jughjkjkjk\njk".sub(/jk$\z/, 'sd')
=> "jughjkjkjk\nsd"
without $ is probably sufficient.
It sounds like you're looking to replace a specific suffix only. If so, I would probably suggest using sub along with an anchored regex (to check for the desired characters only at the end of the string):
string_1 = "lazerjk"
string_2 = "lazerjk\njk"
string_3 = "lazerjkr"
string_1.sub(/jk\z/, "sd")
#=> "lazersd"
string_2.sub(/jk\z/, "sd")
#=> "lazerjk\nsd"
string_3.sub(/jk\z/, "sd")
#=> "lazerjkr"
Or, you could do without a regex at all by using the reverse! method along with a simple conditional statement to sub! only when the suffix is present:
string = "lazerjk"
old_suffix = "jk"
new_suffix = "sd"
string.reverse!.sub!(old_suffix.reverse, new_suffix.reverse).reverse! if string.end_with? (old_suffix)
string
#=> "lazersd"
OR, you could even use a completely different approach. Here's an example using chomp to remove the unwanted suffix and then ljust to pad the desired suffix to the modified string.
string = "lazerjk"
string.chomp("jk").ljust(string.length, "sd")
#=> "lazersd"
Note that the new suffix only gets added if the length of the string was modified with the initial chomp. Otherwise, the string remains unchanged.
If the goal is to substitute the LAST OCCURRENCE (as opposed to suffix only), then this could be accomplished by using sub along with reverse:
string = "jklazerjkm"
old_substring = "jk"
new_substring = "sd"
string.reverse.sub(old_substring.reverse, new_substring.reverse).reverse
#=> "jklazersdm"
Replacing "jk" at the end of a string with something else is straightforward and can be addressed without concern for other instances of "jk" that may be in the string, so I assume that is not what is being asked. Rather, I assume the problem is to replace the last instance of "jk" in a string with "sd".
Here are two solutions that make use of String#sub with a regular expression.
Use a negative lookahead
The idea here is to match "jk" provided it is not followed later in the string by another instance of "jk".
"lajkz\nejkrjklm".sub(/jk(?!.*jk)/m, "sd")
#=> "lajkz\nejkrsdlm"
Capture the part of the string that precedes the last "jk"
The match, if there is one, consists of the front of the string followed by the last "jk", which is replaced by the captured string followed by "sd".
"lajkz\nejkrjklm".sub(/\A(.*)jk/m) { $1 + "sd" }
#=> "lajkz\nejkrsdlm"
The two regular expressions can be written in free-spacing mode to make them self-documenting. The first is the following.
/
jk # match literal
(?! # begin a negative lookahead
.* # match zero or more characters other than line terminators
jk # match literal
) # end negative lookahead
/mx # invoke multiline and free-spacing regex definition modes.
Multiline mode causes . to match any character, including a line terminator.
The second regular expression can be written as follows.
\A # match the beginning of the string
(.*) # match zero or more characters other than line terminators
# and save the match to capture group 1
jk # match literal
/mx # invoke multiline and free-spacing regex definition modes.
Note that in both expressions .* is greedy, meaning that it will match as many characters as possible, including "jk" so long as other requirements of the expression are met, here that the last instance of "jk" in the string is matched.
Here is a different solution:
str = "jughjkjkjk\njk"
pattern = "jk"
replace_with = "sd"
str = str.reverse.sub(pattern.reverse, replace_with.reverse).reverse

Swift, phone number regex

Hopefully a simple one,
I need a a limit of 8 numbers, the user need to write 8 number no more or less.
For now this is my code:
telefonRegex = "^(?=.*[0-9])$"
But it is not working, I just heard about regex fyi.
Your current regex never matches a string because it requires to start matching at the start of the string (^), then makes a forward check to require a digit ([0-9]) to appear after any 0+ chars other than line break chars (.*) and then tries to match the end of the string right after the beginning - tha is, it matches an empty string but also requires at least 1 digit in it.
You may just use
let telefonRegex = "^[0-9]{8}$"
or
let telefonRegex = "\\A[0-9]{8}\\z"
to match a string that only consists of 8 digits.
Details
^ - start of string (may be replaced by \\A in the string literal)
[0-9]{8} - exactly 8 occurrences of any digit
$ - end of string (to make sure the very end of string is matched, use \\z in the string literal).

Lua Pattern Matching, get character before match

Currently I have code that looks like this:
somestring = "param=valueZ&456"
local stringToPrint = (somestring):gsub("(param=)[^&]+", "%1hello", 1)
StringToPrint will look like this:
param=hello&456
I have replaced all of the characters before the & with the string "hello". This is where my question becomes a little strange and specific.
I want my string to appear as: param=helloZ&456. In other words, I want to preserve the character right before the & when replacing the string valueZ with hello to make it helloZ instead. How can this be done?
I suggest:
somestring:gsub("param=[^&]*([^&])", "param=hello%1", 1)
See the Lua demo
Here, the pattern matches:
param= - literal substring param=
[^&]* - 0 or more chars other than & as many as possible
([^&]) - Group 1 capturing a symbol other than & (here, backtracking will occur, as the previous pattern grabs all such chars other than & and then the engine will take a step back and place the last char from that chunk into Group 1).
There are probably other ways to do this, but here is one:
somestring = "param=valueZ&456"
local stringToPrint = (somestring):gsub("(param=).-([^&]&)", "%1hello%2", 1)
print(stringToPrint)
The thing here is that I match the shortest string that ends with a character that is not & and a character that is &. Then I add the two ending characters to the replaced part.

string format checking (with partly random string)

I would like to use regular expression to check if my string have the format like following:
mc_834faisd88979asdfas8897asff8790ds_oa_ids
mc_834fappsd58979asdfas8897asdf879ds_oa_ids
mc_834faispd8fs9asaas4897asdsaf879ds_oa_ids
mc_834faisd8dfa979asdfaspo97asf879ds_dv_ids
mc_834faisd111979asdfas88mp7asf879ds_dv_ids
mc_834fais00979asdfas8897asf87ggg9ds_dv_ids
The format is like mc_<random string>_oa_ids or mc_<random string>_dv_ids . How can I check if my string is in either of these two formats? And please explain the regular expression. thank you.
That's a string start with mc_, while end with _oa_ids or dv_ids, and have some random string in the middle.
P.S. the random string consists of alpha-beta letters and numbers.
What I tried(I have no clue how to check the random string):
/^mc_834faisd88979asdfas8897asff8790ds$_os_ids/
Try this.
^mc_[0-9a-z]+_(dv|oa)_ids$
^ matches at the start of the line the regex pattern is applied to.
[0-9a-z] matces alphabetic and numeric chars.
+ means that there should be one or more chars in this set
(dv|oa) matches dv or oa
$ matches at the end of the string the regex pattern is applied to.
also matches before the very last line break if the string ends with a line break.
Give /\Amc_\w*_(oa|dv)_ids\z/ a try. \A is the beginning of the string, \z the end. \w* are one or more of letters, numbers and underscores and (oa|dv) is either oa or dv.
A nice and simple way to test Ruby Regexps is Rubular, might have a look at it.
This should work
/mc_834([a-z,0-9]*)_(oa|dv)_ids/g
Example: http://regexr.com?2v9q7

Removing lines that begin with > in a rails string

I'm trying to remove any lines that begin with the character '>' in a long string (i.e. replies to an email).
In PHP I'd iterate over each line with an if statement, in linux I'd try and use sed or awk.
What's the most elegant rails approach?
You can try this:
your_string.gsub(/^\>.+\n/,'')
Your question is implying that the input is one string, containing multiple lines.
Do you want the output to be just one string with multiple lines as well? I'm assuming yes.
either using String and Array operations:
str.lines.reject{|x| x =~ /^>/}.join # this will return a new string, without those ">" lines
or using Regular Expressions:
str.gsub(/^>.+\n*/. '')
Better Solution:
You will need to use non-greedy multi-line matching mode for your Regular Expression:
str.gsub(/^>.*?$\n*/m, '') # by using gsub!() you can modify the string in place
^> matches your ">" character at the start of a line
.*?$ matches any characters after the start character until the end of the line (non-greedy)
\n* matches the newline character itself if any (you want to remove that as well)
the "m" at the end of the regular expressions indicates multi-line matching , which will apply the RegExp for each line in the string.
It should work as you expect:
your_string.lines.to_a.reject{|line| line[0] == '>'}.join

Resources