I want to transform " - " string in Ruby to being translatable to regexp. I need to have something like that:
my_regexp => "\s?-\s?"
However, I have a problem with special characters: This "\s" character isn't shown correctly. I tried few ways. Without success.
INPUT => OUTPUT
"\s?" => " ?"
"\\s?" => "\\s?"
Have you any idea how to solve that?
\\ is just a escaped \.
If you print, puts it, you will see the actual string.
>> '\s' # == "\\s"
=> "\\s"
>> puts '\s'
\s
=> nil
BTW, "\s" (not '\s') is another representation of whitespace " ":
>> "\s" == " "
=> true
Most likely, what you're seeing is the result of how IRB displays values. Your second example is correct, (the actual result only contains a single slash, which you can confirm by creating a new Regexp object from it):
>> "\\s?"
"\\s?"
>> puts "\\s?"
\s?
>> Regexp.new "\\s?"
/\s?/
Related
I have a string, that I need to save escaped and then need to interact with programmatically without any backspaces:
string = 'first=#{first_name}&last=#{last_name}'
p string.to_s
=> "first=\#{first_name}&last=\#{last_name}"
puts string.to_s
=> first=#{first_name}&last=#{last_name}
How do I get first=#{first_name}&last=#{last_name} to assign to a variable that I can scan, that does not have the "\" character?
These two are equivalent:
# double quotes
"first=\#{first_name}&last=\#{last_name}"
# single quotes
'first=#{first_name}&last=#{last_name}'
In neither case is the backslash actually part of the string. If say string.include? '\' it will return false.
However, if you were to say '\#{}' the backslash would be part of the string. That's because in single quotes, #{} does not interpolate but is interpreted as literal characters.
Some example:
foo = 1
'#{foo}' # => "\#{foo}"
"#{foo}" # => "1"
'#{foo}' == "\#{foo}" # => true
"\#{foo}".include? '\' # => false
'\#{foo}'.include? '\' # => true
Note that "\" is an invalid string in ruby, but '\' is valid.
I'm trying to write a regular expression in Ruby where I want to see if the string contains a certain word (e.g. "string"), followed by a url and link name in parenthesis.
Right now I'm doing:
string.include?("string") && string.scan(/\(([^\)]+)\)/).present?
My input in both conditionals is a string. In the first one, I'm checking if it contains the word "link" and then I will have the link and link_name in parenthesis, like this:
"Please go to link( url link_name)"
After validating that, I extract the HTML link.
Is there a way I can combine them using regular expressions?
The most important improvement you can make is to also test that the word and the parentheseses have the correct relationship. If I understand correctly, "link(url link_name)" should be a match but "(url link_name)link" or "link stuff (url link_name)" should not. So match "link", the parentheses, and their contents, and capture the contents, all at once:
"stuff link(url link_name) more stuff".match(/link\((\S+?) (\S+?)\)/)&.captures
=> ["url", "link_name"]
(&. is Ruby 2.3; use Rails' .try :captures in older versions.)
Side note: string.scan(regex).present? is more concisely written as string =~ regex.
Checking If a Word Is Contained
If you want to find matches that contain a specific word somewhere in the string, you can accomplish this through a lookahead :
# This will match any string that contains your string "{your-string-here}"
(?=.*({your-string-here}).*).*
You could consider building a string version of your expression and passing the word you are looking for using a variable :
wordToFind = "link"
if stringToTest =~ /(?=.*(#{wordToFind}).*).*/
# stringToTest contains "link"
else
# stringToTest does not contain "link"
end
Checking for a Word AND Parentheses
If you also wanted to ensure that somewhere in your string you had a set of parentheses with some content in them and your previous lookahead for a word, you could use :
# This will match any strings that contain your word and contain a set of parentheses
(?=.*({your-string-here}).*).*\([^\)]+\).*
which might be used as :
wordToFind = "link"
if stringToTest =~ /(?=.*(#{wordToFind}).*).*\([^\)]+\).*/
# stringToTest contains "link" and some non-empty parentheses
else
# stringToTest does not contain "link" or non-empty parentheses
end
def has_both?(str, word)
str.scan(/\b#{word}\b|(?<=\()[^\(\)]+(?=\))/).size == 2
end
has_both?("Wait for me, Wild Bill.", "Bill")
#=> false
has_both?("Wait (for me), Wild William.", "Bill")
#=> false
has_both?("Wait (for me), Wild Billy.", "Bill")
#=> false
has_both?("Wait (for me), Wild bill.", "Bill")
#=> false
has_both?("Wait (for me, Wild Bill.", "Bill")
#=> false
has_both?("Wait (for me), Wild Bill.", "Bill")
#=> true
has_both?("Wait ((for me), Wild Bill.", "Bill")
#=> true
has_both?("Wait ((for me)), Wild Bill.", "Bill")
#=> true
These are the calculations for
word = "Bill"
str = "Wait (for me), Wild Bill."
r = /
\b#{word}\b # match the value of the variable 'word' with word breaks for and aft
| # or
(?<=\() # match a left paren in a positive lookbehind
[^\(\)]+ # match one or more characters other than parens
(?=\)) # match a right paren in a positive lookahead
/x # free-spacing regex definition mode
#=> /
\bBill\b # match the value of the variable 'word' with word breaks for and aft
| # or
(?<=\() # match a left paren in a positive lookbehind
[^\(\)]+ # match one or more characters other than parens
(?=\)) # match a right paren in a positive lookahead
/x
arr = str.scan(r)
#=> ["for me", "Bill"]
arr.size == 2
#=> true
I would go with something like this regex:
/link\s*\(([^\)\s]+)\s*([^\)]+)?\)/i
This will find any match starting with the word link, followed by any number of spaces, then a url followed by a link name, both in parentheses. In this regex, the link name is optional, but the url is not. The matching is case-insensitive, so it will match link and LINK exactly the same.
You can use the Regexp#match method to compare the regex to a string, and check the result for matches and captures, like so:
m = /link\s*\(([^\)\s]+)\s*([^\)]+)?\)/i.match("link (stackoverflow.com StackOverflow)")
if m # the match array is not nil
puts "Matched: #{m[0]}"
puts " -- url: {m[1]}"
puts " -- link-name: #{m[2] || 'none'}"
else # the match array is nil, so no match was found
puts "No match found"
end
If you'd like to use different strings to identify the match, you can use a non-capturing group, where you change link to something like:
(?:link|site|website|url)
In this case, the (?: syntax says not to capture this part of the match. If you want to capture which term matched, simply change that from (?: to (, and adjust the capture indexes by 1 to account for the new capture value.
Here's a short Ruby test program:
data = [
[ true, "link (http://google.com Google)", "http://google.com", "Google" ],
[ true, "LiNk(ftp://website.org)", "ftp://website.org", nil ],
[ true, "link (https://facebook.com/realstanlee/ Stan Lee) linkety link", "https://facebook.com/realstanlee/", "Stan Lee" ],
[ true, "x link (https://mail.yahoo.com Yahoo! Mail)", "https://mail.yahoo.com", "Yahoo! Mail" ],
[ false, "link lunk (http://www.com)", nil, nil ]
]
data.each do |test_case|
link = /link\s*\(([^\)\s]+)\s*([^\)]+)?\)/i.match(test_case[1])
url = link ? link[1] : nil
link_name = link ? link[2] : nil
success = test_case[0] == !link.nil? && test_case[2] == url && test_case[3] == link_name
puts "#{success ? 'Pass' : 'Fail'}: '#{test_case[1]}' #{link ? 'found' : 'not found'}"
if success && link
puts " -- url: '#{url}' link_name: '#{link_name || '(no link name)'}'"
end
end
This produces the following output:
Pass: 'link (http://google.com Google)' found
-- url: 'http://google.com' link_name: 'Google'
Pass: 'LiNk(ftp://website.org)' found
-- url: 'ftp://website.org' link_name: '(no link name)'
Pass: 'link (https://facebook.com/realstanlee/ Stan Lee) linkety link' found
-- url: 'https://facebook.com/realstanlee/' link_name: 'Stan Lee'
Pass: 'x link (https://mail.yahoo.com Yahoo! Mail)' found
-- url: 'https://mail.yahoo.com' link_name: 'Yahoo! Mail'
Pass: 'link lunk (http://www.com)' not found
If you want to allow anything other than spaces between the word 'link' and the first paren, simply change the \s* to [^\(]* and you should be good to go.
In my Rails application I have a field address which is a varchar(255) in my SQLite database.
Yet whenever I save an address consisting of more than one line through a textarea form field, one mysterious whitespace character gets added to the right.
This becomes visible only when the address is right aligned (like e.g. on a letterhead).
Can anybody tell me why this is happening and how it can be prevented?
I am not doing anything special with those addresses in my model.
I already added this attribute writer to my model but it won't remove the whitespace unfortunately:
def address=(a)
write_attribute(:address, a.strip)
end
This is a screenshot:
As you can see only the last line is right aligned. All others contain one character of whitespace at the end.
Edit:
This would be the HTML output from my (Safari) console:
<p>
"John Doe "<br>
"123 Main Street "<br>
"Eggham "<br>
"United Kingdom"<br>
</p>
I don't even know why it's putting the quotes around each line... Maybe that's part of the solution?
I believe textarea is returning CR/LF for line separators and you're seeing one of these characters displayed between each line. See PHP displays \r\n characters when echoed in Textarea for some discussion of this. There are probably better questions out there as well.
You can strip out the whitespace at the start and end of each line. Here are two simple techniques to do that:
# Using simple ruby
def address=(a)
a = a.lines.map(&:strip).join("\n")
write_attribute(:address, a)
end
# Using a regular expression
def address=(a)
a = a.gsub(/^[ \t]+|[ \t]+$/, "")
write_attribute(:address, a)
end
I solved a very similar kind of problem when I ran into something like this,
(I used squish)
think#think:~/CrawlFish$ irb
1.9.3-p385 :001 > "Im calling squish on a string, in irb".squish
NoMethodError: undefined method `squish' for "Im calling squish on a string, in irb":String
from (irb):1
from /home/think/.rvm/rubies/ruby-1.9.3-p385/bin/irb:16:in `<main>'
That proves, there is no squish in irb(ruby)
But rails has squish and squish!(you should know the difference that bang(!) makes)
think#think:~/CrawlFish$ rails console
Loading development environment (Rails 3.2.12)
1.9.3-p385 :001 > str = "Here i am\n \t \n \n, its a new world \t \t \n, its a \n \t new plan\n \r \r,do you like \r \t it?\r"
=> "Here i am\n \t \n \n, its a new world \t \t \n, its a \n \t new plan\n \r \r,do you like \r \t it?\r"
1.9.3-p385 :002 > out = str.squish
=> "Here i am , its a new world , its a new plan ,do you like it?"
1.9.3-p385 :003 > puts out
Here i am , its a new world , its a new plan ,do you like it?
=> nil
1.9.3-p385 :004 >
Take a loot at strip! method
>> #title = "abc"
=> "abc"
>> #title.strip!
=> nil
>> #title
=> "abc"
>> #title = " abc "
=> " abc "
>> #title.strip!
=> "abc"
>> #title
=> "abc"
source
What's the screen shot look like when you do:
def address=(a)
write_attribute(:address, a.strip.unpack("C*").join('-') )
end
Update based on comment answers. Another way to get rid of the \r's at the end of each line:
def address=(a)
a = a.strip.split(/\r\n/).join("\n")
write_attribute(:address, a)
end
Say I have a string like this
"some3random5string8"
I want to insert spaces after each integer so it looks like this
"some3 random5 string8"
I specifically want to do this using gsub but I can't figure out how to access the characters that match my regexp.
For example:
temp = "some3random5string8"
temp.gsub(/\d/, ' ') # instead of replacing with a ' ' I want to replace with
# matching number and space
I was hoping there was a way to reference the regexp match. Something like $1 so I could do something like temp.gsub(/\d/, "#{$1 }") (note, this does not work)
Is this possible?
From the gsub docs:
If replacement is a String it will be substituted for the matched
text. It may contain back-references to the pattern’s capture groups
of the form \d, where d is a group number, or \k, where n is a
group name. If it is a double-quoted string, both back-references must
be preceded by an additional backslash.
This means the following 3 versions will work
>> "some3random5string8".gsub(/(\d)/, '\1 ')
=> "some3 random5 string8 "
>> "some3random5string8".gsub(/(\d)/, "\\1 ")
=> "some3 random5 string8 "
>> "some3random5string8".gsub(/(?<digit>\d)/, '\k<digit> ')
=> "some3 random5 string8 "
Edit: also if you don't want to add an extra space at the end, use a negative lookahead for the end of line, e.g.:
>> "some3random5string8".gsub(/(\d(?!$))/, '\1 ')
=> "some3 random5 string8"
A positive lookahead checking for a "word character" would also work of course:
>> "some3random5string8".gsub(/(\d(?=\w))/, '\1 ')
=> "some3 random5 string8"
Last but not least, the simplest version without a space at the end:
>> "some3random5string8".gsub(/(\d)(\w)/, '\1 \2')
=> "some3 random5 string8"
gsubtakes a block, which for me is easier to remember than the block-less way of getting the match.
"some3random5string8".gsub(/\d/){|digit| digit << " "}
Not sure about ruby syntax, but:
temp.gsub(/(\d)/, '$1 ')
or
temp.gsub(/(\d)/, '\1 ')
To be sure you insert space between number and a non number(i.e. letter or special char):
temp.gsub(/(\d)(\D)/, '$1 $2')
I am not very familiar with ruby, but I expect you can capture the digit, and then insert into replacement like this...
temp.gsub(/(\d)/, '$1 ')
I need to split a string at a period that comes before an equal sign to assign to a hash. E.g.,
"Project.risksPotentialAfterSum=Pot. aft."
should be splitted like this:
{"Project" =>{"risksPotentialAfterSum" => "Pot. aft."}}
For now, I use str.split(/[\.=]/,2) which has a problem for the value that comes after the equal sign. Any ideas?
str = "Project.risksPotentialAfterSum=Pot. aft."
m = str.match(/\A(?<obj>.+?)\.(?<prop>[^.]+?)=(?<val>.+)/)
#=> #<MatchData "Project.risksPotentialAfterSum=Pot. aft." obj:"Project"
h = { m[:obj]=>{ m[:prop]=>m[:val] } }
#=> {"Project"=>{"risksPotentialAfterSum"=>"Pot. aft."}}
That regex says, roughly:
Starting at the start of the string,
find just about anything on the same line (name it 'obj') up until you see a period,
that is followed by one or more characters that aren't a period (name it 'prop') up until you see an equals sign,
and name whatever comes after the equals sign 'val'.
ruby-1.9.2-p136 :028 > str
=> "Project.risksPotentialAfterSum=Pot. aft."
ruby-1.9.2-p136 :029 > split = str.split(/\.|=/,3)
=> ["Project", "risksPotentialAfterSum", "Pot. aft."]
ruby-1.9.2-p136 :030 > Hash[*[split[0],Hash[*split[1,2]]]]
=> {"Project"=>{"risksPotentialAfterSum"=>"Pot. aft."}}
Concepts used here:
Uitlizing the | for regex with states: match the left or match the right of |.
Using the splat operator
Create hash based on list.
Instead of using string splitting you could consider using regular expression matching and capturing the values that you're interested in.
m = "Project.risksPotentialAfterSum=Pot. aft.".match /(\w+)\.(\w+)=(.*)/
h = {m[1] => {m[2] => m[3]}}
#=> {"Project"=>{"risksPotentialAfterSum"=>"Pot. aft."}}