unless (place =~ /^\./) == 0
I know the unless is like if not but what about the condtional?
=~ means matches regex
/^\./ is a regular expression:
/.../ are the delimiters for the regex
^ matches the start of the string or of a line (\A matches the start of the string only)
\. matches a literal .
It checks if the string place starts with a period ..
Consider this:
p ('.foo' =~ /^\./) == 0 # => true
p ('foo' =~ /^\./) == 0 # => false
In this case, it wouldn't be necessary to use == 0. place =~ /^\./ would suffice as a condition:
p '.foo' =~ /^\./ # => 0 # 0 evaluates to true in Ruby conditions
p 'foo' =~ /^\./ # => nil
EDIT: /^\./ is a regular expression. The start and end slashes denotes that it is a regular expression, leaving the important bit to ^\.. The first character, ^ marks "start of string/line" and \. is the literal character ., as the dot character is normally considered a special character in regular expressions.
To read more about regular expressions, see Wikipedia or the excellent regular-expressions.info website.
Related
I have a piece of code, where I can switch words from #post.swap_content to hyperlinks by keyword. For example, I have a word 'michigan' in #post.swap_content and I have keyword 'Michigan' in keywords, so it would switch it to the hyperlink that attached to keyword. Here is part of the function:
def execute
all_keys = Keyword.all.pluck(:key, :link).to_h.transform_keys(&:downcase)
#post.swap_content = #post.swap_content.to_s.gsub!(/\w+/) do |word|
url = all_keys[word.downcase]
url ? "<a href='#{url}'>#{word}</a>" : word
end
#post.save!
end
And my question is - how can I make it gsub only the first two keywords in #post.swap_content? For example, I have #post.swap_content 'michigan, michigan and michigan, utah and utah', how can I switch to hyperlinks only first two keywords(first two 'michigan' and first two 'utah')? I think, that I need somehow to work gsub but I don't know hot to manage number of words that can be gsub.
You can provide a block to gsub that will be invoked with each match, you could use this to count occurences and condtionally replace content.
str = "Dog dog dog cat cat cat"
occurences = {}
str.gsub(/\w+/) do |match|
# downcase so Dog and dog are counted together
key = match.downcase
# build a hash which counts the number of times we've matched a word.
count = occurences.store(key, occurences.fetch(key, 0).next)
# return the word unchanged or wrap in a hyperlink depending on count
count > 2 ? match : "<a>#{match}</a>"
end
# output => "<a>Dog</a> <a>dog</a> dog <a>cat</a> <a>cat</a> cat"
Suppose:
str = "Dog dog cat dog cat Dog cat cat"
If Ruby's regex engine supported variable-length negative lookbehinds we could write:
R = /\b(\w+)\b(?<!(?:\b\1\b.*){2})/i
str.gsub(R, '<a>\1</a>')
#=> "<a>Dog</a> <a>dog</a> <a>cat</a> dog <a>cat</a> Dog cat cat"
We can write this regular expression in free-spacing mode to make it self-documenting:
R = /
\b # assert a word break
(\w+) # match 1+ word characters and save to capture group 1
\b # assert a word break
(?! # begin a negative lookbehind
(?: # begin a non-capture group
\b # assert a word break
\1 # match the content of capture group 1
\b # assert a word break
.* # match 0+ characters
) # end non-capture group
{2} # execute non-capture group twice
) # end negative lookbehind
/ix # assert case-independent and free-spacing regex def modes
Unfortunately, Ruby's regex engine does not support variable-length (positive or negative) lookbehinds (though one day it might). It does, however, support variable-length (positive and negative) lookaheads. We therefore could reverse the string, perform the desired replacements using gsub then reverse the resulting string, as follows:
R = /\b(\w+)\b(?!(?:.*\b\1\b){2})/i
str.reverse.gsub(R, '>a/<\1>a<').reverse
#=> "<a>Dog</a> <a>dog</a> <a>cat</a> dog <a>cat</a> Dog cat cat"
The steps are as follows.
s = str.reverse
#=> "tac tac goD tac god tac god goD"
t = s.gsub(R, '>a/<\1>a<')
#=> "tac tac goD >a/<tac>a< god >a/<tac>a< >a/<god>a< >a/<goD>a<"
t.reverse
#=> "<a>Dog</a> <a>dog</a> <a>cat</a> dog <a>cat</a> Dog cat cat"
Let's have a closer look at the regular expression.
R = /
\b # assert a word break
(\w+) # match 1+ word characters and save to capture group 1
\b # assert a word break
(?! # begin a negative lookahead
(?: # begin a non-capture group
.* # match 0+ characters
\b # assert a word break
\1 # match the content of capture group 1
\b # assert a word break
) # end non-capture group
{2} # execute non-capture group twice
) # end negative lookahead
/ix # assert case-independent and free-spacing regex def modes
I need to check username format using a regular expression.
My username criterion is:
Must contain 1 or more letters, anywhere.
May contain any amount of numbers, anywhere.
Can contain up to 2 - or _, anywhere.
^[0-9\-_]*[a-z|A-Z]+[0-9\-_]*$ is what I was using but this will reject usernames such as 123hi123hi, or hi123hi. I need something less string location dependent.
I'm using Ruby on Rails to match strings against this.
A very inefficient Ruby function version for Rails is:
validate :check_username
def check_username
if self.username.count("-") > 2
errors.add(:username, "cannot contain more than 2 dashes")
elsif self.username.count("_") > 2
errors.add(:username, "cannot contain more than 2 underscores")
elsif self.username.count("a-zA-Z") < 1
errors.add(:username, "must contain a letter")
elsif (self.username =~ /^[0-9a-zA-Z\-_]+$/) !=0
errors.add(:username, "cannot contain special characters")
end
end
Here are two approaches you could use.
Construct a single regex
Because regular expressions are concerned with the ordering of characters in a string, one would have to construct a regular expression for each of the following combinations and then "or" those regexes into a single regex.
one letter, zero hyphens, zero underscores
one letter, zero hyphens, one underscores
one letter, zero hyphens, two underscores
one letter, one hyphen, zero underscores
one letter, one hyphen, one underscore
one letter, one hyphen, two underscores
one letter, two hyphens, zero underscores
one letter, two hyphens, one underscore
one letter, two hyphens, two underscores
Digits and additional letters could appear anywhere in the username.
Let's call those regular expressions t0, t1,..., t8. The desired single, overall regular expression would be:
/#{t0}|#{t1}|...|#{t8}/
Let's consider the construction of t4 (one letter, one hyphen, one underscore).
Six possible orders are possible for this combination.
a letter, a hyphen, an underscore
a letter, an underscore, a hyphen
a hyphen, a letter, an underscore
a hyphen, an underscore, a letter
an underscore, a letter, a hyphen
an underscore, a hyphen, a letter
We would need to construct a regular expression for each of these six orders (r0, r1,..., r5) and then "or" them to obtain t4:
t4 = /#{r0}|#{r1}|#{r2}|#{r3}|#{r4}|#{r5}/
Now let's consider the construction of a regex r0 that would implement the first of these orderings (a letter, a hyphen, an underscore):
r0 = /\A[a-z0-9]*[a-z][a-z0-9]*-[a-z0-9]*_[a-z0-9]*\z/i
"3ab4-3cd_e5".match?(r0) #=> true
"3ab4-3cde5".match?(r0) #=> false (no underscore)
"34-3cd_e5".match?(r0) #=> false (no letter before the hyphen)
"3ab4_3cd-e5".match?(r0) #=> false (underscore precedes hyphen)
Construction of each of the other five ri's would be similar.
We would then need to compute ti for each of the eight combination other than the fifth one. t0 (one letter, zero hyphens, zero underscores) is easy:
t0 = /\A[a-z0-9]*[a-z][a-z0-9]*\z/i
By contrast, t8 (one letter, two hyphens, two underscores) would be a much longer regex than t4 (considered above), as a regular expression would have to be hand-crafted for each of 5!/(2!*2!) #=> 30 orderings (r0, r1,..., r29).
It should now be obvious that the use of a single regular expression is simply not the right tool for validating usernames.
Do not construct a single regex
def username_valid?(username)
cnt = username.each_char.with_object(Hash.new(0)) do |c,cnt|
case c
when /\d/
when /[[:alpha:]]/
cnt[:letter] += 1
when '-'
cnt[:hyphen] += 1
when '_'
cnt[:underscore] += 1
else
return false
end
end
cnt.fetch(:letter, 0) > 0 && cnt.fetch(:hyphen, 0) <= 2 &&
cnt.fetch(:underscore, 0) <= 2
end
username_valid? "Bob" #=> true
username_valid? "Bob1_23_-" #=> true
username_valid? "z" #=> true
username_valid? "123--_" #=> false (no letters)
username_valid? "Melba1-23--_" #=> false (3 hyphens)
username_valid? "Bob1_23_-$" #=> false ($ not permitted)
Hash#new with an argument (the default value) of zero is often called a counting hash. If h is a hash with no key k, h[k] returns the default value. It is evaluated thusly:
h[k] += 1
#=> h[k] = h[k] + 1
#=> h[k] = 0 + 1
The method could instead be written to return false as soon as it determines that the regex is incorrect.
def username_valid?(username)
cnt = username.each_char.with_object(Hash.new(0)) do |c,cnt|
case c
when /\d/
when /[[:alpha:]]/
cnt[:letter] += 1
when '-'
return false if cnt[:hyphen] == 2
cnt[:hyphen] += 1
when '_'
return false if cnt[:underscore] == 2
cnt[:underscore] += 1
else
return false
end
end
cnt.fetch(:letter, 0) > 0
end
This is a bad use for a regular expression because your data isn't structured enough. Instead, a small series of simple tests will tell you what you need to know:
def valid?(str)
str[/[a-z]/i] && str.tr('^-_', '').size <= 2
end
%w(123hi123hi hi123hi).each do |username|
username # => "123hi123hi", "hi123hi"
valid?(username) # => true, true
end
There is a loss of speed due to the use of the regular expression
/[a-z]/i
so instead
def valid?(str)
str.downcase.tr('^a-z', '').size >= 0 && str.tr('^-_', '').size <= 2
end
could be used. The use of the regular expression is about 45% slower based on testing.
Breaking it down:
str[/[a-z]/i] tests for a minimum of one character. Since there can be more than one this will suffice.
str.downcase.tr('^a-z', '').size converts the characters to lowercase, then strips all non-letter characters, resulting in only letters remaining, then counts how many there are:
'123hi123hi'.downcase # => "123hi123hi"
.tr('^a-z', '') # => "hihi"
.size # => 4
'hi123hi'.downcase # => "hi123hi"
.tr('^a-z', '') # => "hihi"
.size # => 4
'hi-123_hi'.downcase # => "hi-123_hi"
.tr('^a-z', '') # => "hihi"
.size # => 4
The rule
May contain any amount of numbers, anywhere
isn't worth testing so I ignored it.
This is improved version of that regex of yours
^[\w-]*[A-Za-z]+[\w-]*$
But this will fail to calculate how many - or _ there, so you will need another regex to filter that or count that manually on code.
This is the regex for check only two or less [-_] disregarding its position:
^[A-Za-z\d]*[-_]{0,1}[A-Za-z\d]*[-_]{0,1}[A-Za-z\d]*$
If you're allowing only letters, numbers, dashes and underscores,
and everything else is considered a special character,
I think it's only that the pattern you have needs negation.
Instead of (self.username =~ /^[0-9a-zA-Z\-_]+$/) !=0
try (self.username =~ /^[^0-9a-zA-Z\-_]+$/) !=0
or (self.username =~ /^[\W-]+$/) > 0.
Also, why not do a count for special characters, like in the conditions above this one?
I'm trying to write a regular expression in Ruby where I want to see if the string contains a certain word (e.g. "string"), followed by a url and link name in parenthesis.
Right now I'm doing:
string.include?("string") && string.scan(/\(([^\)]+)\)/).present?
My input in both conditionals is a string. In the first one, I'm checking if it contains the word "link" and then I will have the link and link_name in parenthesis, like this:
"Please go to link( url link_name)"
After validating that, I extract the HTML link.
Is there a way I can combine them using regular expressions?
The most important improvement you can make is to also test that the word and the parentheseses have the correct relationship. If I understand correctly, "link(url link_name)" should be a match but "(url link_name)link" or "link stuff (url link_name)" should not. So match "link", the parentheses, and their contents, and capture the contents, all at once:
"stuff link(url link_name) more stuff".match(/link\((\S+?) (\S+?)\)/)&.captures
=> ["url", "link_name"]
(&. is Ruby 2.3; use Rails' .try :captures in older versions.)
Side note: string.scan(regex).present? is more concisely written as string =~ regex.
Checking If a Word Is Contained
If you want to find matches that contain a specific word somewhere in the string, you can accomplish this through a lookahead :
# This will match any string that contains your string "{your-string-here}"
(?=.*({your-string-here}).*).*
You could consider building a string version of your expression and passing the word you are looking for using a variable :
wordToFind = "link"
if stringToTest =~ /(?=.*(#{wordToFind}).*).*/
# stringToTest contains "link"
else
# stringToTest does not contain "link"
end
Checking for a Word AND Parentheses
If you also wanted to ensure that somewhere in your string you had a set of parentheses with some content in them and your previous lookahead for a word, you could use :
# This will match any strings that contain your word and contain a set of parentheses
(?=.*({your-string-here}).*).*\([^\)]+\).*
which might be used as :
wordToFind = "link"
if stringToTest =~ /(?=.*(#{wordToFind}).*).*\([^\)]+\).*/
# stringToTest contains "link" and some non-empty parentheses
else
# stringToTest does not contain "link" or non-empty parentheses
end
def has_both?(str, word)
str.scan(/\b#{word}\b|(?<=\()[^\(\)]+(?=\))/).size == 2
end
has_both?("Wait for me, Wild Bill.", "Bill")
#=> false
has_both?("Wait (for me), Wild William.", "Bill")
#=> false
has_both?("Wait (for me), Wild Billy.", "Bill")
#=> false
has_both?("Wait (for me), Wild bill.", "Bill")
#=> false
has_both?("Wait (for me, Wild Bill.", "Bill")
#=> false
has_both?("Wait (for me), Wild Bill.", "Bill")
#=> true
has_both?("Wait ((for me), Wild Bill.", "Bill")
#=> true
has_both?("Wait ((for me)), Wild Bill.", "Bill")
#=> true
These are the calculations for
word = "Bill"
str = "Wait (for me), Wild Bill."
r = /
\b#{word}\b # match the value of the variable 'word' with word breaks for and aft
| # or
(?<=\() # match a left paren in a positive lookbehind
[^\(\)]+ # match one or more characters other than parens
(?=\)) # match a right paren in a positive lookahead
/x # free-spacing regex definition mode
#=> /
\bBill\b # match the value of the variable 'word' with word breaks for and aft
| # or
(?<=\() # match a left paren in a positive lookbehind
[^\(\)]+ # match one or more characters other than parens
(?=\)) # match a right paren in a positive lookahead
/x
arr = str.scan(r)
#=> ["for me", "Bill"]
arr.size == 2
#=> true
I would go with something like this regex:
/link\s*\(([^\)\s]+)\s*([^\)]+)?\)/i
This will find any match starting with the word link, followed by any number of spaces, then a url followed by a link name, both in parentheses. In this regex, the link name is optional, but the url is not. The matching is case-insensitive, so it will match link and LINK exactly the same.
You can use the Regexp#match method to compare the regex to a string, and check the result for matches and captures, like so:
m = /link\s*\(([^\)\s]+)\s*([^\)]+)?\)/i.match("link (stackoverflow.com StackOverflow)")
if m # the match array is not nil
puts "Matched: #{m[0]}"
puts " -- url: {m[1]}"
puts " -- link-name: #{m[2] || 'none'}"
else # the match array is nil, so no match was found
puts "No match found"
end
If you'd like to use different strings to identify the match, you can use a non-capturing group, where you change link to something like:
(?:link|site|website|url)
In this case, the (?: syntax says not to capture this part of the match. If you want to capture which term matched, simply change that from (?: to (, and adjust the capture indexes by 1 to account for the new capture value.
Here's a short Ruby test program:
data = [
[ true, "link (http://google.com Google)", "http://google.com", "Google" ],
[ true, "LiNk(ftp://website.org)", "ftp://website.org", nil ],
[ true, "link (https://facebook.com/realstanlee/ Stan Lee) linkety link", "https://facebook.com/realstanlee/", "Stan Lee" ],
[ true, "x link (https://mail.yahoo.com Yahoo! Mail)", "https://mail.yahoo.com", "Yahoo! Mail" ],
[ false, "link lunk (http://www.com)", nil, nil ]
]
data.each do |test_case|
link = /link\s*\(([^\)\s]+)\s*([^\)]+)?\)/i.match(test_case[1])
url = link ? link[1] : nil
link_name = link ? link[2] : nil
success = test_case[0] == !link.nil? && test_case[2] == url && test_case[3] == link_name
puts "#{success ? 'Pass' : 'Fail'}: '#{test_case[1]}' #{link ? 'found' : 'not found'}"
if success && link
puts " -- url: '#{url}' link_name: '#{link_name || '(no link name)'}'"
end
end
This produces the following output:
Pass: 'link (http://google.com Google)' found
-- url: 'http://google.com' link_name: 'Google'
Pass: 'LiNk(ftp://website.org)' found
-- url: 'ftp://website.org' link_name: '(no link name)'
Pass: 'link (https://facebook.com/realstanlee/ Stan Lee) linkety link' found
-- url: 'https://facebook.com/realstanlee/' link_name: 'Stan Lee'
Pass: 'x link (https://mail.yahoo.com Yahoo! Mail)' found
-- url: 'https://mail.yahoo.com' link_name: 'Yahoo! Mail'
Pass: 'link lunk (http://www.com)' not found
If you want to allow anything other than spaces between the word 'link' and the first paren, simply change the \s* to [^\(]* and you should be good to go.
I have a string and I need to check whether the last character of that string is *, and if it is, I need to remove it.
if stringvariable.include? "*"
newstring = stringvariable.gsub(/[*]/, '')
end
The above does not search if the '*' symbol is the LAST character of the string.
How do i check if the last character is '*'?
Thanks for any suggestion
Use the $ anchor to only match the end of line:
"sample*".gsub(/\*$/, '')
If there's the possibility of there being more than one * on the end of the string (and you want to replace them all) use:
"sample**".gsub(/\*+$/, '')
You can also use chomp (see it on API Dock), which removes the trailing record separator character(s) by default, but can also take an argument, and then it will remove the end of the string only if it matches the specified character(s).
"hello".chomp #=> "hello"
"hello\n".chomp #=> "hello"
"hello\r\n".chomp #=> "hello"
"hello\n\r".chomp #=> "hello\n"
"hello\r".chomp #=> "hello"
"hello \n there".chomp #=> "hello \n there"
"hello".chomp("llo") #=> "he"
"hello*".chomp("*") #=> "hello"
String has an end_with? method
stringvariable.chop! if stringvariable.end_with? '*'
You can do the following which will remove the offending character, if present. Otherwise it will do nothing:
your_string.sub(/\*$/, '')
If you want to remove more than one occurrence of the character, you can do:
your_string.sub(/\*+$/, '')
Of course, if you want to modify the string in-place, use sub! instead of sub
Cheers,
Aaron
You can either use a regex or just splice the string:
if string_variable[-1] == '*'
new_string = string_variable.gsub(/[\*]/, '') # note the escaped *
end
That only works in Ruby 1.9.x...
Otherwise you'll need to use a regex:
if string_variable =~ /\*$/
new_string = string_variable.gsub(/[\*]/, '') # note the escaped *
end
But you don't even need the if:
new_string = string_variable.gsub(/\*$/, '')
I'm looking to check if a string is all capitals in Rails.
How would I go about doing that?
I'm writing my own custom pluralize helper method and I would something be passing words like "WORD" and sometimes "Word" - I want to test if my word is all caps so I can return "WORDS" - with a capital "S" in the end if the word is plural (vs. "WORDs").
Thanks!
Do this:
str == str.upcase
E.g:
str = "DOG"
str == str.upcase # true
str = "cat"
str == str.upcase # false
Hence the code for your scenario will be:
# In the code below `upcase` is required after `str.pluralize` to transform
# DOGs to DOGS
str = str.pluralize.upcase if str == str.upcase
Thanks to regular expressions, it is very simple. Just use [[:upper:]] character class which allows all upper case letters, including but not limited to those present in ASCII-8.
Depending on what do you need exactly:
# allows upper case characters only
/\A[[:upper:]]*\Z/ =~ "YOURSTRING"
# additionally disallows empty strings
/\A[[:upper:]]+\Z/ =~ "YOURSTRING"
# also allows white spaces (multi-word strings)
/\A[[:upper:]\s]*\Z/ =~ "YOUR STRING"
# allows everything but lower case letters
/\A[^[:lower:]]*\Z/ =~ "YOUR 123_STRING!"
Ruby doc: http://www.ruby-doc.org/core-2.1.4/Regexp.html
Or this:
str =~ /^[A-Z]+$/
e.g.:
"DOG" =~ /^[A-Z]+$/ # 0
"cat" =~ /^[A-Z]+$/ # nil