I have problem that i cant find a solution for with regular expressions, but i know it has to need it.
say i have a string inputed, say 'asdasd asdaeew asioij'
which a user makes. how would i accomplish this
for every word
execute me
end
Regular expressions seem overkill for this one - just use
s = "aaaaaa bbbb cccc"
s.split.each do |w|
puts w
end
Another possibility, splitting by word boundaries instead of spaces:
kiss = "Keep it simple, stupid!"
kiss.scan(/\b\w+\b/) do |w|
puts w #=> "Keep" ... "it" ... "simple" ... "stupid"
end
# Instead of:
kiss.split.each do |w|
puts w #=> "Keep" ... "it" ... "simple," ... "stupid!"
end
Related
Is it possible to set a conditional statement (IF statement) comparing a variable against a variable that iterates through the values inside an array? I was looking for something like:
array_of_small_words = ["and","or","be","the","of","to","in"]
if word == array_of_small_words.each
# do thing
else
# do another thing
end
Basically, I want to capitalize each word but don't want to do it for "small words". I know I could do the the opposite and iterate through the array first and then compare each iteration with the word but I was hoping there would be a more efficient way.
sentence = ["this","is","a","sample","of","a","title"]
array_of_small_words = ["and","or","be","the","of","to","in"]
sentence.each do |word|
array_of_small_words.each do |small_words|
if word == small_words
# don't capitalize word
else
# capitalize word
end
end
end
I'm not really sure if this is possible or if there is a better way of doing this?
Thank you!
sentence = ["this","is","a","sample","of","a","title"]
array_of_small_words = ["and","or","be","the","of","to","in"]
sentence.map do |word|
array_of_small_words.include?(word) ? word : word.upcase
end
#⇒ ["THIS", "IS", "A", "SAMPLE", "of", "A", "TITLE"]
What you're looking for is if array_of_small_words.include?(word).
This should be faster than #mudasobwa's repeated use of include? if packaged in a method and used frequency. It would not be faster, however, if mudsie used a set lookup (a minor change, of which he is well-aware), as I mentioned in a comment. If efficiency is important, I'd prefer mudsie's way with the set mod over my answer. In a way I was just playing around below.
I've assumed he small words are and, or, be, the, of, to, in and notwithstanding.
SMALL_WORDS = %w| and or be the of to in notwithstanding |
#=> ["and", "or", "be", "the", "of", "to", "in", "notwithstanding"]
(SMALL_WORDS_HASH = SMALL_WORDS.map { |w| [w.upcase, w] }.to_h).
default_proc = proc { |h,k| h[k]=k }
Test:
SMALL_WORDS_HASH
#=> {"AND"=>"and", "OR"=>"or", "BE"=>"be", "THE"=>"the", "OF"=>"of",
# "TO"=>"to", "IN"=>"in", "NOTWITHSTANDING"=>"notwithstanding"}
SMALL_WORDS_HASH["TO"]
#=> "of"
SMALL_WORDS_HASH["HIPPO"]
#=> "HIPPO"
def convert(arr)
arr.join(' ').upcase.gsub(/\w+/, SMALL_WORDS_HASH)
end
convert ["this","is","a","sample","of","a","title"]
#=> "THIS IS A SAMPLE of A TITLE"
I am having trouble writing this so that it will take a sentence as an argument and perform the translation on each word without affecting the punctuation.
I'd also like to continue using the partition method.
It would be nice if I could have it keep a quote together as well, such as:
"I said this", I said.
would be:
"I aidsay histay", I said.
def convert_sentence_pig_latin(sentence)
p split_sentence = sentence.split(/\W/)
pig_latin_sentence = []
split_sentence.each do |word|
if word.match(/^[^aeiou]+/x)
pig_latin_sentence << word.partition(/^[^aeiou]+/x)[2] + word.partition(/^[^aeiou]+/x)[1] + "ay"
else
pig_latin_sentence << word
end
end
rejoined_pig_sentence = pig_latin_sentence.join(" ").downcase + "."
p rejoined_pig_sentence.capitalize
end
convert_sentence_pig_latin("Mary had a little lamb.")
Your main problem is that [^aeiou] matches every character outside that range, including spaces, commas, quotation marks, etc.
If I were you, I'd use a positive match for consonants, ie. [b-df-hj-np-tv-z] I would also put that regex in a variable, so you're not having to repeat it three times.
Also, in case you're interested, there's a way to make your convert_sentence_pig_latin method a single gsub and it will do the whole sentence in one pass.
Update
...because you asked...
sentence.gsub( /\b([b-df-hj-np-tv-z])(\w+)/i ) { "#{$2}#{$1}ay" }
# iterate over and replace regexp matches using gsub
def convert_sentence_pig_latin2(sentence)
r = /^[^aeiou]+/i
sentence.gsub(/"([^"]*)"/m) {|x| x.gsub(/\w+/) {|y| y =~ r ? "#{y.partition(r)[2]}#{y.partition(r)[1]}ay" : y}}
end
puts convert_sentence_pig_latin2('"I said this", I said.')
# define instance method: String#to_pl
class String
R = Regexp.new '^[^aeiou]+', true # => /^[^aeiou]+/i
def to_pl
self.gsub(/"([^"]*)"/m) {|x| x.gsub(/\w+/) {|y| y =~ R ? "#{y.partition(R)[2]}#{y.partition(R)[1]}ay" : y}}
end
end
puts '"I said this", I said.'.to_pl
sources:
http://www.ruby-doc.org/core-2.1.0/Regexp.html
http://ruby-doc.org/core-2.0/String.html#method-i-gsub
In RSpec, there is matcher expect{}.to change{}.to like
expect{employee.change_name}.to change{employee.name}.to "Mike"
It is very easy to read, but is not that easy to understand how it works from language standpoint. I suppose that expect, to and change are methods, but what objects are they called at? What curly braces mean in that case?
Thank you.
change and expect are methods of self and to is a method of the result of executing change and expect. The {} expressions are blocks passed to change and expect.
The following illustrates the order of evaluation:
def self.to1(arg)
puts "to1(#{arg})"
"to1"
end
def self.to2(arg)
puts "to2(#{arg})"
"to2"
end
def self.expect
puts "expect"
yield
self
end
def self.change
puts "change"
yield
self
end
expect{puts "b1"}.to1 change{puts "b2"}.to2 "#{puts 'Mike' ; 'Mike'}"
which produces the following output:
expect
b1
change
b2
Mike
to2(Mike)
to1(to2)
=> "to1"
They are blocks in ruby.
Basically the first step towards lambda expressions, basically anonymous functions.
Ok. It's late and I'm tired.
I want to match a character in a string. Specifically, the appearance of 'a'. As in "one and a half".
If I have a string which is all lowercase.
"one and a half is always good" # what a dumb example. No idea how I thought of that.
and I call titleize on it
"one and a half is always good".titleize #=> "One And A Half Is Always Good"
This is wrong because the 'And' and the 'A' should be lowercase. Obviously.
So, I can do
"One and a Half Is always Good".titleize.tr('And', 'and') #=> "One and a Half Is always Good"
My question: how do I make the "A" an "a" and without making the "Always" into "always"?
This does it:
require 'active_support/all'
str = "one and a half is always good" #=> "one and a half is always good"
str.titleize.gsub(%r{\b(A|And|Is)\b}i){ |w| w.downcase } #=> "One and a Half is Always Good"
or
str.titleize.gsub(%r{\b(A(nd)?|Is)\b}i){ |w| w.downcase } #=> "One and a Half is Always Good"
Take your pick of either of the last two lines. The regex pattern could be created elsewhere and passed in as a variable, for maintenance or code cleanliness.
I like Greg's two-liner (first titleize, then use a regex to downcase selected words.) FWIW, here's a function I use in my projects. Well tested, although much more verbose. You'll note that I'm overriding titleize in ActiveSupport:
class String
#
# A better titleize that creates a usable
# title according to English grammar rules.
#
def titleize
count = 0
result = []
for w in self.downcase.split
count += 1
if count == 1
# Always capitalize the first word.
result << w.capitalize
else
unless ['a','an','and','by','for','in','is','of','not','on','or','over','the','to','under'].include? w
result << w.capitalize
else
result << w
end
end
end
return result.join(' ')
end
end
I want to write a function that allows users to match data based on a regexp, but I am concerned about sanitation of the user strings. I know with SQL queries you can use bind variables to avoid SQL injection attacks, but I am not sure if there's such a mechanism for regexps. I see that there's Regexp.escape, but I want to allow valid regexps.
Here is is the sample function:
def tagged?(text)
tags.each do |tag|
return true if text =~ /#{tag.name}/i
end
return false
end
Since I am just matching directly on tag.name is there a chance that someone could insert a Proc call or something to break out of the regexp and cause havoc?
Any advice on best practice would be appreciated.
Interpolated strings in a Regexp are not executed, but do generate annoying warnings:
/#{exit -3}/.match('test')
# => exits
foo = '#{exit -3}'
/#{foo}/.match('test')
# => warning: regexp has invalid interval
# => warning: regexp has `}' without escape
The two warnings seem to pertain to the opening #{ and the closing } respectively, and are independent.
As a strategy that's more efficient, you might want to sanitize the list of tags into a combined regexp you can run once. It is generally far less efficient to construct and test against N regular expressions than 1 with N parts.
Perhaps something along the lines of this:
class Taggable
def tags
#tags
end
def tags=(value)
#tags = value
#tag_regexp = Regexp.new(
[
'^(?:',
#tags.collect do |tag|
'(?:' + tag.sub(/\#\{/, '\\#\\{').sub(/([^\\])\}/, '\1\\}') + ')'
end.join('|'),
')$'
].to_s,
Regexp::IGNORECASE
)
end
def tagged?(text)
!!text.match(#tag_regexp)
end
end
This can be used like this:
e = Taggable.new
e.tags = %w[ #{exit-3} .*\.gif .*\.png .*\.jpe?g ]
puts e.tagged?('foo.gif').inspect
If the exit call was executed, the program would halt there, but it just interprets that as a literal string. To avoid warnings it is escaped with backslashes.
You should probably create an instance of the Regexp class instead.
def tagged?(text)
return tags.any? { |tag| text =~ Regexp.new(tag.name, Regexp::IGNORECASE) }
end