ruby on rails regular expressions - ruby-on-rails

In my Rails application i have a generic search to display the matching results. What I have done to produce matching results is to replace blank spaces by "%" symbol. Its working perfectly but only if there is a gap between the search term . If I enter a single word it says "no matching string".
class TweetsController<ApplicationController
def index
city = params[:show]
search_term = params[:text]
search_term[" "] = "%"
city_coordinates = Coordinates.where('city=?', city)
#tweets = if (city_coordinates.count == 1 && city_coordinates.first.valid_location?)
Tweets.for_coordinates(city_coordinates.first) & Tweets.where("tweet_text LIKE?" ,"%#{search_term}%").all
else if (Coordinates.count != 1 )
Tweets.for_user_location(city) & Tweets.where("tweet_text LIKE ?" , "%#{search_term}%").all
else
#tweets = Tweets.where("%tweet_text% LIKE ? ", "%#{search_term}%").all
end
end
end
end
I am getting output only if I type two words like "Harbhajan Singh", "VVS Laxman" . If I type a single word its saying no matching strings. Anybody help me with this. I need the output both ways the user enters single word or two words or more .Anybody help me with this.

Probably, you are getting an
IndexError: string not matched
Thats because when there is a single word coming in params[:text], this code
search_term[" "] = "%"
raises the error.
You might want to read the string documentation for more details. It states:
If the regular expression or string is used as the index doesn’t match a position in the string, IndexError is raised.
Hope this helps.

I'm not too great with regular expressions myself, so I usually turn to Rubular. It helps you build and test regular expressions for Ruby.

Related

simpler way to modify a string

I recently solved this problem, but felt there is a simpler way to do it. I'd like to use fewer lines of code than I am now. I'm new to ruby so if the answer is simple I'd love to add it to my toolbag. Thank you in advance.
goal: accept a word as an arg, and return the word with it's last vowel removed, if no vowels - return the original word
def hipsterfy(word)
vowels = "aeiou"
i = word.length - 1
while i >= 0
if vowels.include?(word[i])
return word[0...i] + word[i+1..-1]
end
i -= 1
end
word
end
try this regex magic:
def hipsterfy(word)
word.gsub(/[aeiou](?=[^aeiou]*$)/, "")
end
how does it work?
[aeiou] looks for a vowel., and ?=[^aeiou]*$ adds the constraint "where there is no vowel match in the following string. So the regex finds the last vowel. Then we just gsub the matched (last vowel) with "".
You could use rindex to find the last vowel's index and []= to remove the corresponding character:
def hipsterfy(word)
idx = word.rindex(/[aoiou]/)
word[idx] = '' if idx
word
end
The if idx is needed because rindex returns nil if no vowel is found. Note that []= modifies word.
There's also rpartition which splits the string at the given pattern, returning an array containing the part before, the match and the part after. By concat-enating the former and latter, you can effectively remove the middle part: (i.e. the vowel)
def hipsterfy(word)
before, _, after = word.rpartition(/[aoiou]/)
before.concat(after)
end
This variant returns a new string, leaving word unchanged.
Another common approach when dealing with some last occurrence is to reverse the string so you can deal with a first occurrence instead (which is usually simpler). Here, you can utilize sub:
def hipsterfy(word)
word.reverse.sub(/[aeiou]/, '').reverse
end
Here is another way to do it.
Reverse the characters of the string
Use find_index to get the first vowel location in this reversed string
Delete the character at this index
Un-reverse the characters and join them back together.
reverse_chars = str.chars.reverse
vowel_idx = reverse_chars.find_index { |char| char =~ /[aeiou]/ }
reverse_chars.delete_at(vowel_idx) if vowel_idx
result = reverse_chars.reverse.join

Looping through array targeting upcase letters only

Am trying to loop through a string which i have converted to an array and target only the upcase letters which i will then insert an empty space before the capitalized letter. My code checks for the first cap letter and adds the space but am struggling to do it for the next cap letter which in this case is "T". Any advise would be appreciated. Thanks
def break_camel(str)
# ([A-Z])/.match(str)
saved_string = str.freeze
cap_index =str.index(/[A-Z]/)
puts(cap_index)
x =str.split('').insert(cap_index, " ")
x.join
end
break_camel("camelCasingTest")
It's much easier to operate on your string directly, using String#gsub, than breaking it into pieces, operating on each piece then gluing everything back together again.
def break_camel(str)
str.gsub(/(?=[A-Z])/, ' ')
end
break_camel("camelCasingTest")
#=> "camel Casing Test"
break_camel("CamelCasingTest")
#=> " Camel Casing Test"
This converts a "zero-width position", immediately before each capital letter (and after the preceding character, if there is one), to a space. The expression (?=[A-Z]) is called a positive lookahead.
If you don't want to insert a space if the capital letter is at the beginning of a line, change the method as follows.
def break_camel(str)
str.gsub(/(?<=.)(?=[A-Z])/, ' ')
end
break_camel("CamelCasingTest")
#=> "Camel Casing Test"
(?<=.) is a positive lookbehind that requires the capital letter to be preceded by any character for the match to be made.
Another way of writing this is as follows.
def break_camel(str)
str.gsub(/(?<=.)([A-Z]))/, ' \1')
end
break_camel("CamelCasingTest")
#=> "Camel Casing Test"
Here the regular expression matches a capital letter that is not at the beginning of the line and saves it to capture group 1. It is then replaced by a space followed by the contents of capture group 1.
I think your approach is looking to keep reapplying your method until needed. One extension of your code is to use recursion:
def break_camel(str)
regex = /[a-z][A-Z]/
if str.match(regex)
cap_index = str.index(regex)
str.insert(cap_index + 1, " ")
break_camel(str)
else
str
end
end
break_camel("camelCasingTest") #=> "camel Casing Test"
Notice the break_camel method inside the method. Another way is by using the scan method passing the appropriate regex before rejoining them.
In code:
'camelCasingTest'.scan(/[A-Z]?[a-z]+/).join(' ') #=> "camel Casing Test"
Do you have to implement your own?
Looks like titleize https://apidock.com/rails/ActiveSupport/Inflector/titleize has this covered.

Ruby on Rails: Checking for valid regex does not work properly, high false rate

In my application I've got a procedure which should check if an input is valid or not. You can set up a regex for this input.
But in my case it returns false instead of true. And I can't find the problem.
My code looks like this:
gaps.each_index do | i |
if gaps[i].first.starts_with? "~"
# regular expression
begin
regex = gaps[i].first[1..-1]
# a pipe is used to seperate the regex from the solution-string
if regex.include? "|"
puts "REGEX FOUND ------------------------------------------"
regex = regex.split("|")[0..-2].join("|")
end
reg = Regexp.new(regex, true)
unless reg.match(data[i])
puts "REGEX WRONGGGG -------------------"
#wrong_indexes << i
end
rescue
end
else
# normal string
if data[i].nil? || data[i].strip != gaps[i].first.strip
#wrong_indexes << i
end
end
An example would be:
[[~berlin|berlin]]
The left one before the pipe is the regex and the right one next to the pipe is the correct solution.
This easy input should return true, but it doesn't.
Does anyone see the problem?
Thank you all
EDIT
Somewhere in this lines must be the problem:
if regex.include? "|"
puts "REGEX FOUND ------------------------------------------"
regex = regex.split("|")[0..-2].join("|")
end
reg = Regexp.new(regex, true)
unless reg.match(data[i])
Update: Result without ~
The whole point is that you are initializing regex using the Regexp constructor
Constructs a new regular expression from pattern, which can be either a String or a Regexp (in which case that regexp’s options are propagated, and new options may not be specified (a change as of Ruby 1.8).
However, when you pass the regex (obtained with regex.split("|")[0..-2].join("|")) to the constructor, it is a string, and reg = Regexp.new(regex, true) is getting ~berlin (or /berlin/i) as a literal string pattern. Thus, it actually is searching for something you do not expect.
See, regex= "[[/berlin/i|berlin]]" only finds a *literal /berlin/i text (see demo).
Also, you need to get the pattern from the [[...]], so strip these brackets with regex = regex.gsub(/\A\[+|\]+\z/, '').split("|")[0..-2].join("|").
Note you do not need to specify the case insensitive options, since you already pass true as the second parameter to Regexp.new, it is already case-insensitive.
If you are performing whole word lookup, add word boundaries: regex= "[[\\bberlin\\b|berlin]]" (see demo).

Ruby regex to find words starting with #

I'm trying to write a very simple regex to find all words in a string that start with the symbol #. Then change the word to a link. Like you would see in a Twitter where you can mention other usernames.
So far I have written this
def username_link(s)
s.gsub(/\#\w+/, "<a href='/username'>username</a>").html_safe
end
I know it's very basic and not much, but I'd rather write it on my own right now, to fully understand it, before searching GitHub to find a more complex one.
What I'm trying to find out is how can I reference that matched word and include it in the place of username. Once I can do that i can easily strip the first character, #, out of it.
Thanks.
You can capture using parentheses and backreference with \1 (and \2, and so on):
def username_link(s)
s.gsub(/#(\w+)/, "<a href='/\\1'>\\1</a>").html_safe
end
See also this answer
You should use gsub with back references:
str = "I know it's very basic and not much, but #tim I'd rather write it on my own."
def username_to_link(str)
str.gsub(/\#(\w+)/, '#\1')
end
puts username_to_link(str)
#=> I know it's very basic and not much, but #tim I'd rather write it on my own.
Following Regex should handle corner cases which other answers ignore
def auto_username_link(s)
s.gsub(/(^|\s)\#(\w+)($|\s)/, "\\1<a href='/\\2'>\\2</a>\\3").html_safe
end
It should ignore strings like "someone#company" or "#username-1" while converting everything like "Hello #username rest of message"
How about this:
def convert_names_to_links(str)
str = " " + str
result = str.gsub(
/
(?<=\W) #Look for a non-word character(space/punctuation/etc.) preceeding
# #an "#" character, followed by
(\w+) #a word character, one or more times
/xm, #Standard normalizing flags
'#\1'
)
result[1..-1]
end
my_str = "#tim #tim #tim, ##tim,#tim t#mmy?"
puts convert_names_to_links(my_str)
--output:--
#tim #tim #tim, ##tim,#tim t#mmy?

Regular expression to remove only beginning and end html tags from string?

I would like to remove for example <div><p> and </p></div> from the string below. The regex should be able to remove an arbitrary number of tags from the beginning and end of the string.
<div><p>text to <span class="test">test</span> the selection on.
Kibology for <b>all</b><br>. All <i>for</i> Kibology.</p></div>
I have been tinkering with rubular.com without success. Thanks!
def remove_html_end_tags(html_str)
html_str.match(/\<(.+)\>(?!\W*\<)(.+)\<\/\1\>/m)[2]
end
I'm not seeing the problem of \<(.+)> consuming multiple opening tags that Alan Moore pointed out below, which is odd because I agree it's incorrect. It should be changed to \<([^>\<]+)> or something similar to disambiguate.
def remove_html_end_tags(html_str)
html_str.match(/\<([^\>\<]+)\>(?!\W*?\<)(.+)\<\/\1\>/m)[2]
end
The idea is that you want to capture everything between the open/close of the first tag encountered that is not followed immediately by another tag, even with spaces between.
Since I wasn't sure how (with positive lookahead) to say give me the first key whose closing angle bracket is followed by at least one word character before the next opening angle bracket, I said
\>(?!\W*\<)
find the closing angle bracket that does not have all non-word characters before the next open angle bracket.
Once you've identified the key with that attribute, find its closing mate and return the stuff between.
Here's another approach. Find tags scanning forward and remove the first n. Would blow up with nested tags of the same type, but I wouldn't take this approach for any real work.
def remove_first_n_html_tags(html_str, skip_count=0)
matches = []
tags = html_str.scan(/\<([\w\s\_\-\d\"\'\=]+)\>/).flatten
tags.each do |tag|
close_tag = "\/%s" % tag.split(/\s+/).first
match_str = "<#{tag}>(.+)<#{close_tag}>"
match = html_str.match(/#{match_str}/m)
matches << match if match
end
matches[skip_count]
end
Still involves some programming:
str = '<div><p>text to <span class="test">test</span> the selection on.
Kibology for <b>all</b><br>. All <i>for</i> Kibology.</p></div>'
while (m = /\A<.+?>/.match(str)) && str.end_with?('</' + m[0][1..-1])
str = str[m[0].size..-(m[0].size + 2)]
end
Cthulhu you out there?
I am going to go ahead and answer my own question. Below is the programmatic route:
The input string goes into the first loop as an array in order to remove the front tags. The resulting string is looped through in reverse order in order to remove the end tags. The string is then reversed in order to put it in the correct order.
def remove_html_end_tags(html_str)
str_no_start_tag = ''
str_no_start_and_end_tag = ''
a = html_str.split("")
i= 0
is_text = false
while i <= (a.length - 1)
if (a[i] == '<') && !is_text
while (a[i] != '>')
i+= 1
end
i+=1
else
is_text = true
str_no_start_tag << a[i]
i+=1
end
end
a = str_no_start_tag.split("")
i= a.length - 1
is_text = false
while i >= 0
if (a[i] == '>') && !is_text
while (a[i] != '<')
i-= 1
end
i-=1
else
is_text = true
str_no_start_and_end_tag << a[i]
i-=1
end
end
str_no_start_and_end_tag.reverse!
end
(?:\<div.*?\>\<p.*?\>)|(?:\<\/p\>\<\/div\>) is the expression you need. But this doesn't check for every scenario... if you are trying to parse any possible combination of tags, you may want to look at other ways to parse.
Like for example, this expression doesn't allow for any whitespace between the div and p tag. So if you wanted to allow for that, you would add \s* inbetween the \>\< sections of the tag like so: (?:\<div.*?\>\s*\<p.*?\>)|(?:\<\/p\>\s*\<\/div\>).
The div tag and the p tag are expected to be lowercase, as the expression is written. So you may want to figure out a way to check for upper or lower case letters for each, so that Div or dIV would be found too.
Use gskinner's RegEx tool for testing and learning Regular Expressions.
So your end ruby code should look something like this:
# Ruby sample for showing the use of regular expressions
str = "<div><p>text to <span class=\"test\">test</span> the selection on.
Kibology for <b>all</b><br>. All <i>for</i> Kibology.</p></div>"
puts 'Before Reguar Expression: "', str, '"'
str.gsub!(/(?:\<div.*?\>\s*\<p.*?\>)|(?:\<\/p\>\s*\<\/div\>)/, "")
puts 'After Regular Expression', str
system("pause")
EDIT: Replaced div*? to div.*? and replaced p*? to p.*? per suggestions in the comments.
EDIT: This answer doesn't allow for any set of tags, just the two listed in the first line of the question.

Resources