Specify Unicode Character in Regular Expression

Specify Unicode Character in Regular Expression - ruby-on-rails

How can I create a ruby regular expression that includes a unicode character?
For example, I would like to the character "\u0002" in my regular expression.

You can write /\x02/ :
"\u0002" =~ /\x02/
#=> 0
If you're not sure, you can just start from a string :
Regexp.new("\u0002")
#=> /\x02/
Here's another example :
"☀☁☂" =~ /\u2602/
#=> 2
As mentionned by #TomLord in the comments, you can also specify a range. To check if a string includes a UTF-8 arrow :
"↹" =~ /[\u2190-\u21FF]/
#=> 0

Related

How to find last occurrence of a substring in a given string?

I have a string, which describe some word, I must change ending of it to "sd", if ending == "jk".
For an example, I have word: "lazerjk", I need to get from it "lazersd".
I tried to use method .gsub!, but it doesn't work correctly if we have more than one occurrence of substring "jk" in a word.

String#rindex returns the index of the last occurrence of the given substring
String#[]= can take two integers arguments, first is index where start to replace and second - length of replaced string
You can use them this way:
replaced = "foo"
replacing = "booo"
string = "foo bar foo baz"
string[string.rindex(replaced), replaced.size] = replacing
string
# => "foo bar booo baz"

"jughjkjkjk\njk".sub(/jk$\z/, 'sd')
=> "jughjkjkjk\nsd"
without $ is probably sufficient.

It sounds like you're looking to replace a specific suffix only. If so, I would probably suggest using sub along with an anchored regex (to check for the desired characters only at the end of the string):
string_1 = "lazerjk"
string_2 = "lazerjk\njk"
string_3 = "lazerjkr"
string_1.sub(/jk\z/, "sd")
#=> "lazersd"
string_2.sub(/jk\z/, "sd")
#=> "lazerjk\nsd"
string_3.sub(/jk\z/, "sd")
#=> "lazerjkr"
Or, you could do without a regex at all by using the reverse! method along with a simple conditional statement to sub! only when the suffix is present:
string = "lazerjk"
old_suffix = "jk"
new_suffix = "sd"
string.reverse!.sub!(old_suffix.reverse, new_suffix.reverse).reverse! if string.end_with? (old_suffix)
string
#=> "lazersd"
OR, you could even use a completely different approach. Here's an example using chomp to remove the unwanted suffix and then ljust to pad the desired suffix to the modified string.
string = "lazerjk"
string.chomp("jk").ljust(string.length, "sd")
#=> "lazersd"
Note that the new suffix only gets added if the length of the string was modified with the initial chomp. Otherwise, the string remains unchanged.

If the goal is to substitute the LAST OCCURRENCE (as opposed to suffix only), then this could be accomplished by using sub along with reverse:
string = "jklazerjkm"
old_substring = "jk"
new_substring = "sd"
string.reverse.sub(old_substring.reverse, new_substring.reverse).reverse
#=> "jklazersdm"

Replacing "jk" at the end of a string with something else is straightforward and can be addressed without concern for other instances of "jk" that may be in the string, so I assume that is not what is being asked. Rather, I assume the problem is to replace the last instance of "jk" in a string with "sd".
Here are two solutions that make use of String#sub with a regular expression.
Use a negative lookahead
The idea here is to match "jk" provided it is not followed later in the string by another instance of "jk".
"lajkz\nejkrjklm".sub(/jk(?!.*jk)/m, "sd")
#=> "lajkz\nejkrsdlm"
Capture the part of the string that precedes the last "jk"
The match, if there is one, consists of the front of the string followed by the last "jk", which is replaced by the captured string followed by "sd".
"lajkz\nejkrjklm".sub(/\A(.*)jk/m) { $1 + "sd" }
#=> "lajkz\nejkrsdlm"
The two regular expressions can be written in free-spacing mode to make them self-documenting. The first is the following.
/
jk # match literal
(?! # begin a negative lookahead
.* # match zero or more characters other than line terminators
jk # match literal
) # end negative lookahead
/mx # invoke multiline and free-spacing regex definition modes.
Multiline mode causes . to match any character, including a line terminator.
The second regular expression can be written as follows.
\A # match the beginning of the string
(.*) # match zero or more characters other than line terminators
# and save the match to capture group 1
jk # match literal
/mx # invoke multiline and free-spacing regex definition modes.
Note that in both expressions .* is greedy, meaning that it will match as many characters as possible, including "jk" so long as other requirements of the expression are met, here that the last instance of "jk" in the string is matched.

Here is a different solution:
str = "jughjkjkjk\njk"
pattern = "jk"
replace_with = "sd"
str = str.reverse.sub(pattern.reverse, replace_with.reverse).reverse

Regex not working as expected

I'm trying to return true if a string is 16 characters or more with regex here is what I'm currently working with.
CODE:
"<p>#MichiganHouseWarehouseEvent</p>" == /\S{16,}/
I'm trying to say if string is 16 chars or more without a space return true but this returns false... Any idea what I'm doing wrong?

A String cannot be equal (==) to a Regexp. It can match one, though.
Complete string
If you want to check that the complete string has more than 16 characters and none of them is a whitespace (or a newline, or a tab...) :
"<p>#MichiganHouseWarehouseEvent</p>" =~ /\A\S{16,}\z/
#=> 0
Note that in Ruby, 0 is truthy. It is the index on which the match begins.
With Ruby 2.4, you could use match? to get a boolean directly.
Substring
If you want to check there's at least one substring with 16 non-whitespace characters :
"0123 01234567890abcdef" =~ /\S{16}/
#=> 5
This condition is less restrictive than the previous one :
"0123 01234567890abcdef" =~ /\A\S{16,}\z/
#=> nil
You could also use :
"0123 01234567890abcdef".split.any?{ |no_space| no_space.size >= 16 }
#=> true

How do I check to see if a string contains specific characters, and if it does replace just that bit?

I want to replace all strings that contain the characters c# which can include c#-4.0, c#-3.0, ms-c#, etc.
How do I check to see if c# exists within a string, and if it does, just replace the c# portion of that string?
i.e. for c#-4.0 the modified string would be c%23-4.0. It would be preferable if a native method of the Ruby core library is used (like one of String's methods).
I tried tagname.replace('c%23') but that replaces the entire string, and not just the substring that matches the pattern.
Thoughts?

Use String gsub method:
"c#-4.0".gsub!(/c#/, "c%23")

You can use the gsub method of String. (Use sub if you want to replace only once in the string).
"hello".gsub(/[aeiou]/, '*') #=> "h*ll*"
"hello".gsub(/([aeiou])/, '<\1>') #=> "h<e>ll<o>"
"hello".gsub(/./) {|s| s.ord.to_s + ' '} #=> "104 101 108 108 111 "
"hello".gsub(/(?<foo>[aeiou])/, '{\k<foo>}') #=> "h{e}ll{o}"
'hello'.gsub(/[eo]/, 'e' => 3, 'o' => '*') #=> "h3ll*"
# in your case :
"some string c#".gsub!('c#', 'c%23') #=> "some string c%23"

string format check

Suppose I have string variables like following:
s1="10$"
s2="10$ I am a student"
s3="10$Good"
s4="10$ Nice weekend!"
As you see above, s2 and s4 have white space(s) after 10$ .
Generally, I would like to have a way to check if a string start with 10$ and have white-space(s) after 10$ . For example, The rule should find s2 and s4 in my above case. how to define such rule to check if a string start with '10$' and have white space(s) after?
What I mean is something like s2.RULE? should return true or false to tell if it is the matched string.
---------- update -------------------
please also tell the solution if 10# is used instead of 10$

You can do this using Regular Expressions (Ruby has Perl-style regular expressions, to be exact).
# For ease of demonstration, I've moved your strings into an array
strings = [
"10$",
"10$ I am a student",
"10$Good",
"10$ Nice weekend!"
]
p strings.find_all { |s| s =~ /\A10\$[ \t]+/ }
The regular expression breaks down like this:
The / at the beginning and the end tell Ruby that everything in between is part of the regular expression
\A matches the beginning of a string
The 10 is matched verbatim
\$ means to match a $ verbatim. We need to escape it since $ has a special meaning in regular expressions.
[ \t]+ means "match at least one blank and/or tab"
So this regular expressions says "Match every string that starts with 10$ followed by at least one blank or tab character". Using the =~ you can test strings in Ruby against this expression. =~ will return a non-nil value, which evaluates to true if used in a conditional like if.
Edit: Updated white space matching as per Asmageddon's suggestion.

this works:
"10$ " =~ /^10\$ +/
and returns either nil when false or 0 when true. Thanks to Ruby's rule, you can use it directly.

Use a regular expression like this one:
/10\$\s+/
EDIT
If you use =~ for matching, note that
The =~ operator returns the character position in the string of the
start of the match
So it might return 0 to denote a match. Only a return of nil means no match.
See for example http://www.regular-expressions.info/ruby.html on a regular expression tutorial for ruby.

If you want to proceed to cases with $ and # then try this regular expression:
/^10[\$#] +/

How to check string contains special character in ruby

How to check whether a string contains special character in ruby. If I get regular expression also it is fine.
Please let me know

Use str.include?.
Returns true if str contains the given string or character.
"hello".include? "lo" #=> true
"hello".include? "ol" #=> false
"hello".include? ?h #=> true

special = "?<>',?[]}{=-)(*&^%$#`~{}"
regex = /[#{special.gsub(/./){|char| "\\#{char}"}}]/
You can then use the regex to test if a string contains the special character:
if some_string =~ regex
This looks a bit complicated: what's going on in this bit
special.gsub(/./){|char| "\\#{char}"}
is to turn this
"?<>',?[]}{=-)(*&^%$#`~{}"
into this:
"\\?\\<\\>\\'\\,\\?\\[\\]\\}\\{\\=\\-\\)\\(\\*\\&\\^\\%\\$\\#\\`\\~\\{\\}"
Which is every character in special, escaped with a \ (which itself is escaped in the string, ie \\ not \). This is then used to build a regex like this:
/[<every character in special, escaped>]/

"foobar".include?('a')
# => true

Why not use inverse of [:alnum:] posix.
Here [:alnum:] includes all 0-9, a-z, A-Z.
Read more here.

"Hel#lo".index( /[^[:alnum:]]/ )
This will return nil in case you do not have any special character and hence eaiest way I think.

How about this command in Ruby 2.0.0 and above?
def check_for_a_special_charachter(string)
/\W/ === string
end
Therefore, with:
!"He#llo"[/\W/].nil? => True
!"Hello"[/\W/].nil? => False

if you looking for a particular character, you can make a range of characters that you want to include and check if what you consider to be a special character is not part of that arsenal
puts String([*"a".."z"].join).include? "a" #true
puts String([*"a".."z"].join).include? "$" #false
I think this is flexible because here you are not limited as to what should be excluded
puts String([*"a".."z",*0..9,' '].join).include? " " #true

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Specify Unicode Character in Regular Expression - ruby-on-rails

How can I create a ruby regular expression that includes a unicode character? For example, I would like to the character "\u0002" in my regular expression.

Related

How to find last occurrence of a substring in a given string?

Regex not working as expected

How do I check to see if a string contains specific characters, and if it does replace just that bit?

string format check

How to check string contains special character in ruby

Categories

Resources