How to find a keyword in a string - ruby-on-rails

Users send in smses which must include a keyword. This keyword is then used to find a business.
The instruction is to use the keyword at the start of the sentence.
I know some users won't use the keyword at the beginning or will add tags (# # -) or punctuation (keyword.) to the keyword.
What is an efficient way to look for this keyword and for the business?
My attempt:
scrubbed_message = msg.gsub("\"", "").gsub("\'", "").gsub("#", "").gsub("-", "").gsub(",", "").gsub(".", "").gsub("#", "").split.join(" ")
tag = scrubbed_msg.split[0]
if #business = Business.where(tag: tag).first
log_message(#business)
else
scrubbed_msg.split.each do |w|
if #business = Business.where(tag: w).first
log_message(#business)
end
end
end

Instead of which characters you want to remove from the string, I suggest to use a whitelist approach specifying which characters you want to keep, for example alphanumeric characters:
sms = "#keyword and the rest"
clean_sms = sms.scan(/[\p{Alnum}]+/)
# => ["keyword", "and", "the", "rest"]
And then, if I got right what you are trying to do, to find the business you are looking for you could do something like this:
first_existing_tag = clean_sms.find do |tag|
Business.exists?(tag: tag)
end
#business = Business.where(tag: first_existing_tag).first
log_message(#business)

You can use Regexp match to filter all unnecessary characters out of the String, then use #reduce method on the Array git from splitted string to get the first occurience of a record with tag field matched to a keyword, in the exmaple: keyword, tag1, tag2:
msg = "key.w,ord tag-1'\n\"tag2"
# => "key.w,ord tag-1'\n\"tag2"
scrubbed = msg.gsub(/[#'"\-\.,#]/, "").split
# => ["keyword", "tag1", "tag2"]
#business = scrubbed.reduce(nil) do| sum, tag |
sum || Business.where(tag: tag).first
end
# => Record tag: keyword
# => Record tag: tag1 if on record with keyword found

Related

Ruby switch words for hyperlinks without changing the chars

I have two different models: post (it has a content) and keywords (it has the word and the link). I am trying to make a function which would switch words in post content with the same keywords and its link (so it would work as hyperlink) For examples there is a keyword 'Hello' with some link on it and word 'hello', I want 'hello' in post.content to become a hyperlink with link from 'Hello' in keywords.
Here is my function:
def execute
#post = Post.find(params[:post_id])
all_keys = Keyword.all.pluck(:key, :link)
all_keys = all_keys.map.to_h
all_keys = all_keys.transform_keys(&:downcase)
new_content = #post.content.to_s
new_content_downcase = new_content.downcase
all_keys.map { |key, link| new_content_downcase.gsub!(key, "<a href='#{link}'>#{key}</a>") }
#post.content = new_content_downcase
#post.save!
end
Function is easy: I made a hash {key: 'link'} and have #post.content, then I downcase hash keys and #post.content and switch the words in post content with key from hash and link (so it would look like hyperlink).
Everything works fine but the problem is that it switch words in #post.content to lowercase (Hello --> hello). Is there any way to switch compare new_content and new_content_downcase, save the original word AND hyperlink on it?
Just don't downcase the post's content, that's it :) You could use gsub! with the block to make things concise, smth. like the following:
def execute
#post = Post.find(params[:post_id])
keys = Keyword.pluck(:key, :link).to_h.transform_keys(&:downcase)
#post.content.gsub!(/\w+/) do |word|
# We downcase each word when we check for the links presence...
url = keys[word.downcase]
# ... but not when we do replacements.
url ? "<a href='#{url}'>#{word}</a>" : word
end
#post.save!
end
So, your output is all lower case because you've applied #downcase to both your list of keywords and your content. And I assume you did that because you're doing a literal match between the keyword and the content string in your gsub.
One solution is to use a case-insensitive regex instead, :
all_keys.map { |key, link|
#post.content.gsub!(/(#{key})/i, "<a href='#{link}'>\1</a>")
}
Here, I've ignored the downcase and just used #post.content directly (I assume that it's a string so the to_s is redundant).
Then, in the gsub, I replaced the key direct match with a regex. This uses brackets to capture the term that's found for use in the replace term, so that you retain the capitalisation of the source rather than that of the stored keyword. The \1 in the replacement string is how that stored result from the regex gets used.
Fingers crossed that gets you working!
===Edit===
Here's an attempt at doing this properly, updating the entire method. (I'd also not escaped the \1 above, which it needs because it's in double quotes. Sorry about that!)
def export
#post = Post.find(params[:post_id])
_content = #post.content
Keyword.pluck(:key, :link).to_h.each { |_key, _link|
_content.gsub!(/(#{_key})/i, "<a href='#{_link}'>\\1</a>")
}
#post.update(content: _content)
end
Don't add key after \1, as you mention in a comment - the \1 should automatically be replaced with whatever was found by the regex (i.e. the value of key regardless of case).
Also, you shouldn't need to downcase your Keyword entries in any case: the time to do that is when they're created, so you only have to do it once.

How do I extract just a specific portion of a code snippet from multiple files, that may be different in different files

So what I am doing is iterating over various versions of snippet of code (for e.g. Associations.rb in Rails).
What I want to do is just extract one snippet of the code, for example the has_many method:
def has_many(name, scope = nil, options = {}, &extension)
reflection = Builder::HasMany.build(self, name, scope, options, &extension)
Reflection.add_reflection self, name, reflection
end
At first I was thinking of just searching this entire file for the string def has_many and then saving everything between that string and end. The obvious issue with this, is that different versions of this file can have multiple end strings within the method.
For instance, whatever I come up with for the above snippet, should also work for this one too:
def has_many(association_id, options = {})
validate_options([ :foreign_key, :class_name, :exclusively_dependent, :dependent, :conditions, :order, :finder_sql ], options.keys)
association_name, association_class_name, association_class_primary_key_name =
associate_identification(association_id, options[:class_name], options[:foreign_key])
require_association_class(association_class_name)
if options[:dependent] and options[:exclusively_dependent]
raise ArgumentError, ':dependent and :exclusively_dependent are mutually exclusive options. You may specify one or the other.' # ' ruby-mode
elsif options[:dependent]
module_eval "before_destroy '#{association_name}.each { |o| o.destroy }'"
elsif options[:exclusively_dependent]
module_eval "before_destroy { |record| #{association_class_name}.delete_all(%(#{association_class_primary_key_name} = '\#{record.id}')) }"
end
define_method(association_name) do |*params|
force_reload = params.first unless params.empty?
association = instance_variable_get("##{association_name}")
if association.nil?
association = HasManyAssociation.new(self,
association_name, association_class_name,
association_class_primary_key_name, options)
instance_variable_set("##{association_name}", association)
end
association.reload if force_reload
association
end
# deprecated api
deprecated_collection_count_method(association_name)
deprecated_add_association_relation(association_name)
deprecated_remove_association_relation(association_name)
deprecated_has_collection_method(association_name)
deprecated_find_in_collection_method(association_name)
deprecated_find_all_in_collection_method(association_name)
deprecated_create_method(association_name)
deprecated_build_method(association_name)
end
Assuming that each value is stored as text in some column in my db.
How do I approach this, using Ruby's string methods or should I be approaching this another way?
Edit 1
Please note that this question relates specifically to string manipulation via using a Regex, without a parser.
As discussed, this should be done with a parser like Ripper.
However, to answer if it can be done with string methods, I will match the syntax with a regex, provided:
You can rely on indentation i.e. the string has the exact same characters before "def" and before "end".
There are no multiline strings in between that could simulate an "end" with the same indentation. That includes multine strings, HEREDOC, %{ }, etc.
Code
regex = /^
(\s*) # matches the indentation (we'll backreference later)
def\ +has_many\b # literal "def has_many" with a word boundary
(?:.*+\n)*? # match whole lines - as few as possible
\1 # matches the same indentation as the def line
end\b # literal "end"
/x
subject = %q|
def has_many(name, scope = nil, options = {}, &extension)
if association.nil?
instance_variable_set("##{association_name}", association)
end
end|
#Print matched text
puts subject.to_enum(:scan,regex).map {$&}
ideone demo
The regex relies on:
Capturing the whitespace (indentation) with the group (\s*),
followed by the literal def has_many.
It then consumes as few lines as it can with (?:.*+\n)*?.
Notice that .*+\n matches a whole line
and (?:..)*? repeats it 0 or more times. Also, the last ? makes the repetition lazy (as few as possible).
It will consume lines until it matches the following condition...
\1 is a backreference, storing the text matched in (1), i.e. the exact same indentation as the first line.
Followed by end obviously.
Test in Rubular

Multiple word search in one field

I have search functionality with this code in the 'search.rb' file:
votsphonebooks = votsphonebooks.where("address like ?", "%#{address}%") if address.present?
There are multiple fields, this is just one of them.
How can I successfully change this line into something like a map to include multiple words.
Eg. If they type in '123 Fake St' - it will look for exactly that, but I want it to search for '123', 'Fake', 'St'.
First thing you should do is split the address by spaces:
addresses = params[:address].split(" ")
Then what you need is a OR query, you could do it by using ARel.
t = VotsPhoneBook.arel_table # The class name is my guess
arel_query = addresses.reduce(nil) do |q, address|
q.nil? ? t[:address].matches("%#{address}%") : q.or(t[:address].matches("%#{address}%"))
end
results = Post.where(arel_query)
Try using REGEXP instead of LIKE:
address_arr = address.split(" ")
votsphonebooks = votsphonebooks.where('address REGEXP ?',address_arr.join('|')) unless address_arr.blank?

ActionView::Helpers::TextHelper excerpt helper is not fully functional

I am using module ActionView::Helpers::TextHelper to generate an excerpt from a text. If a word exists more than once, it will just excerpt the first occurrence.
<%= excerpt('Hello, i am a Ruby lover, a Rails lover and would never come back to PHP', 'lover', :radius => 5) %>
"...lover,..."
I was expecting the return string to be something like, becauee there two occurrences of the word 'lover':
"...lover,...lover ..."
How can i get it to work to display multiple occurrences of a keyword?
I am using rails 3.2.11.
excerpt(text, phrase, options = {}) Link:
Extracts an excerpt from text that matches the first instance of phrase. The :radius option expands the excerpt on each side of the first occurrence of phrase
as the documantation states, is only the first instance of the phrase you search, not every instance of it
I've been using a multi_excerpt() method defined in my application_helper.rb
# Returns a summary of +text+ in the form of +phrase+ excerpts
#
# multi_excerpt('This string is is a very long long long string ', 'string', radius: 5)
# # => ...This string is i...long string ...
def multi_excerpt(text, phrase, options = {})
return unless text && phrase
radius = options.fetch(:radius, 10)
omission = options.fetch(:omission, "...")
raise if phrase.is_a? Regexp
regex = /.{,#{radius}}#{Regexp.escape(phrase)}.{,#{radius}}/i
parts = text.scan(regex)
"#{omission}#{parts.join(omission)}#{omission}"
end
Linking here my related post and PR.

Sanitizing User Regexp

I want to write a function that allows users to match data based on a regexp, but I am concerned about sanitation of the user strings. I know with SQL queries you can use bind variables to avoid SQL injection attacks, but I am not sure if there's such a mechanism for regexps. I see that there's Regexp.escape, but I want to allow valid regexps.
Here is is the sample function:
def tagged?(text)
tags.each do |tag|
return true if text =~ /#{tag.name}/i
end
return false
end
Since I am just matching directly on tag.name is there a chance that someone could insert a Proc call or something to break out of the regexp and cause havoc?
Any advice on best practice would be appreciated.
Interpolated strings in a Regexp are not executed, but do generate annoying warnings:
/#{exit -3}/.match('test')
# => exits
foo = '#{exit -3}'
/#{foo}/.match('test')
# => warning: regexp has invalid interval
# => warning: regexp has `}' without escape
The two warnings seem to pertain to the opening #{ and the closing } respectively, and are independent.
As a strategy that's more efficient, you might want to sanitize the list of tags into a combined regexp you can run once. It is generally far less efficient to construct and test against N regular expressions than 1 with N parts.
Perhaps something along the lines of this:
class Taggable
def tags
#tags
end
def tags=(value)
#tags = value
#tag_regexp = Regexp.new(
[
'^(?:',
#tags.collect do |tag|
'(?:' + tag.sub(/\#\{/, '\\#\\{').sub(/([^\\])\}/, '\1\\}') + ')'
end.join('|'),
')$'
].to_s,
Regexp::IGNORECASE
)
end
def tagged?(text)
!!text.match(#tag_regexp)
end
end
This can be used like this:
e = Taggable.new
e.tags = %w[ #{exit-3} .*\.gif .*\.png .*\.jpe?g ]
puts e.tagged?('foo.gif').inspect
If the exit call was executed, the program would halt there, but it just interprets that as a literal string. To avoid warnings it is escaped with backslashes.
You should probably create an instance of the Regexp class instead.
def tagged?(text)
return tags.any? { |tag| text =~ Regexp.new(tag.name, Regexp::IGNORECASE) }
end

Resources