I am using module ActionView::Helpers::TextHelper to generate an excerpt from a text. If a word exists more than once, it will just excerpt the first occurrence.
<%= excerpt('Hello, i am a Ruby lover, a Rails lover and would never come back to PHP', 'lover', :radius => 5) %>
"...lover,..."
I was expecting the return string to be something like, becauee there two occurrences of the word 'lover':
"...lover,...lover ..."
How can i get it to work to display multiple occurrences of a keyword?
I am using rails 3.2.11.
excerpt(text, phrase, options = {}) Link:
Extracts an excerpt from text that matches the first instance of phrase. The :radius option expands the excerpt on each side of the first occurrence of phrase
as the documantation states, is only the first instance of the phrase you search, not every instance of it
I've been using a multi_excerpt() method defined in my application_helper.rb
# Returns a summary of +text+ in the form of +phrase+ excerpts
#
# multi_excerpt('This string is is a very long long long string ', 'string', radius: 5)
# # => ...This string is i...long string ...
def multi_excerpt(text, phrase, options = {})
return unless text && phrase
radius = options.fetch(:radius, 10)
omission = options.fetch(:omission, "...")
raise if phrase.is_a? Regexp
regex = /.{,#{radius}}#{Regexp.escape(phrase)}.{,#{radius}}/i
parts = text.scan(regex)
"#{omission}#{parts.join(omission)}#{omission}"
end
Linking here my related post and PR.
Related
I have two different models: post (it has a content) and keywords (it has the word and the link). I am trying to make a function which would switch words in post content with the same keywords and its link (so it would work as hyperlink) For examples there is a keyword 'Hello' with some link on it and word 'hello', I want 'hello' in post.content to become a hyperlink with link from 'Hello' in keywords.
Here is my function:
def execute
#post = Post.find(params[:post_id])
all_keys = Keyword.all.pluck(:key, :link)
all_keys = all_keys.map.to_h
all_keys = all_keys.transform_keys(&:downcase)
new_content = #post.content.to_s
new_content_downcase = new_content.downcase
all_keys.map { |key, link| new_content_downcase.gsub!(key, "<a href='#{link}'>#{key}</a>") }
#post.content = new_content_downcase
#post.save!
end
Function is easy: I made a hash {key: 'link'} and have #post.content, then I downcase hash keys and #post.content and switch the words in post content with key from hash and link (so it would look like hyperlink).
Everything works fine but the problem is that it switch words in #post.content to lowercase (Hello --> hello). Is there any way to switch compare new_content and new_content_downcase, save the original word AND hyperlink on it?
Just don't downcase the post's content, that's it :) You could use gsub! with the block to make things concise, smth. like the following:
def execute
#post = Post.find(params[:post_id])
keys = Keyword.pluck(:key, :link).to_h.transform_keys(&:downcase)
#post.content.gsub!(/\w+/) do |word|
# We downcase each word when we check for the links presence...
url = keys[word.downcase]
# ... but not when we do replacements.
url ? "<a href='#{url}'>#{word}</a>" : word
end
#post.save!
end
So, your output is all lower case because you've applied #downcase to both your list of keywords and your content. And I assume you did that because you're doing a literal match between the keyword and the content string in your gsub.
One solution is to use a case-insensitive regex instead, :
all_keys.map { |key, link|
#post.content.gsub!(/(#{key})/i, "<a href='#{link}'>\1</a>")
}
Here, I've ignored the downcase and just used #post.content directly (I assume that it's a string so the to_s is redundant).
Then, in the gsub, I replaced the key direct match with a regex. This uses brackets to capture the term that's found for use in the replace term, so that you retain the capitalisation of the source rather than that of the stored keyword. The \1 in the replacement string is how that stored result from the regex gets used.
Fingers crossed that gets you working!
===Edit===
Here's an attempt at doing this properly, updating the entire method. (I'd also not escaped the \1 above, which it needs because it's in double quotes. Sorry about that!)
def export
#post = Post.find(params[:post_id])
_content = #post.content
Keyword.pluck(:key, :link).to_h.each { |_key, _link|
_content.gsub!(/(#{_key})/i, "<a href='#{_link}'>\\1</a>")
}
#post.update(content: _content)
end
Don't add key after \1, as you mention in a comment - the \1 should automatically be replaced with whatever was found by the regex (i.e. the value of key regardless of case).
Also, you shouldn't need to downcase your Keyword entries in any case: the time to do that is when they're created, so you only have to do it once.
So what I am doing is iterating over various versions of snippet of code (for e.g. Associations.rb in Rails).
What I want to do is just extract one snippet of the code, for example the has_many method:
def has_many(name, scope = nil, options = {}, &extension)
reflection = Builder::HasMany.build(self, name, scope, options, &extension)
Reflection.add_reflection self, name, reflection
end
At first I was thinking of just searching this entire file for the string def has_many and then saving everything between that string and end. The obvious issue with this, is that different versions of this file can have multiple end strings within the method.
For instance, whatever I come up with for the above snippet, should also work for this one too:
def has_many(association_id, options = {})
validate_options([ :foreign_key, :class_name, :exclusively_dependent, :dependent, :conditions, :order, :finder_sql ], options.keys)
association_name, association_class_name, association_class_primary_key_name =
associate_identification(association_id, options[:class_name], options[:foreign_key])
require_association_class(association_class_name)
if options[:dependent] and options[:exclusively_dependent]
raise ArgumentError, ':dependent and :exclusively_dependent are mutually exclusive options. You may specify one or the other.' # ' ruby-mode
elsif options[:dependent]
module_eval "before_destroy '#{association_name}.each { |o| o.destroy }'"
elsif options[:exclusively_dependent]
module_eval "before_destroy { |record| #{association_class_name}.delete_all(%(#{association_class_primary_key_name} = '\#{record.id}')) }"
end
define_method(association_name) do |*params|
force_reload = params.first unless params.empty?
association = instance_variable_get("##{association_name}")
if association.nil?
association = HasManyAssociation.new(self,
association_name, association_class_name,
association_class_primary_key_name, options)
instance_variable_set("##{association_name}", association)
end
association.reload if force_reload
association
end
# deprecated api
deprecated_collection_count_method(association_name)
deprecated_add_association_relation(association_name)
deprecated_remove_association_relation(association_name)
deprecated_has_collection_method(association_name)
deprecated_find_in_collection_method(association_name)
deprecated_find_all_in_collection_method(association_name)
deprecated_create_method(association_name)
deprecated_build_method(association_name)
end
Assuming that each value is stored as text in some column in my db.
How do I approach this, using Ruby's string methods or should I be approaching this another way?
Edit 1
Please note that this question relates specifically to string manipulation via using a Regex, without a parser.
As discussed, this should be done with a parser like Ripper.
However, to answer if it can be done with string methods, I will match the syntax with a regex, provided:
You can rely on indentation i.e. the string has the exact same characters before "def" and before "end".
There are no multiline strings in between that could simulate an "end" with the same indentation. That includes multine strings, HEREDOC, %{ }, etc.
Code
regex = /^
(\s*) # matches the indentation (we'll backreference later)
def\ +has_many\b # literal "def has_many" with a word boundary
(?:.*+\n)*? # match whole lines - as few as possible
\1 # matches the same indentation as the def line
end\b # literal "end"
/x
subject = %q|
def has_many(name, scope = nil, options = {}, &extension)
if association.nil?
instance_variable_set("##{association_name}", association)
end
end|
#Print matched text
puts subject.to_enum(:scan,regex).map {$&}
ideone demo
The regex relies on:
Capturing the whitespace (indentation) with the group (\s*),
followed by the literal def has_many.
It then consumes as few lines as it can with (?:.*+\n)*?.
Notice that .*+\n matches a whole line
and (?:..)*? repeats it 0 or more times. Also, the last ? makes the repetition lazy (as few as possible).
It will consume lines until it matches the following condition...
\1 is a backreference, storing the text matched in (1), i.e. the exact same indentation as the first line.
Followed by end obviously.
Test in Rubular
I'm trying to match a string as such:
text = "This is a #hastag"
raw(
h(text).gsub(/(?:\B#)(\w*[A-Z]+\w*)/i, embed_hashtag('\1'))
)
def embed_hashtag('data')
#... some code to turn the captured hashtag string into a link
#... return the variable that includes the final string
end
My problem is that when I pass '\1' in my embed_hashtag method that I call with gsub, it simply passes "\1" literally, rather than the first captured group from my regex. Is there an alternative?
FYI:
I'm wrapping text in h to escape strings, but then I'm embedding code into user inputted text (i.e. hashtags) which needs to be passed raw (hence raw).
It's important to keep the "#" symbol apart from the text, which is why I believe I need the capture group.
If you have a better way of doing this, don't hesitate to let me know, but I'd still like an answer for the sake of answering the question in case someone else has this question.
Use the block form gsub(regex){ $1 } instead of gsub(regex, '\1')
You can simplify the regex to /\B#(\w+)/i as well
You can leave out the h() helper, Rails 4 will escape malicious input by default
Specify method arguments as embed_hashtag(data) instead of embed_hashtag('data')
You need to define embed_hashtag before doing the substitution
To build a link, you can use link_to(text, url)
This should do the trick:
def embed_hashtag(tag)
url = 'http://example.com'
link_to tag, url
end
raw(
text.gsub(/\B#(\w+)/i){ embed_hashtag($1) }
)
The correct way would be the use of a block here.
Example:
def embed_hashtag(data)
puts "#{data}"
end
text = 'This is a #hashtag'
raw(
h(text).gsub(/\B#(\S+)/) { embed_hashtag($1) }
)
Try last match regexp shortcut:
=> 'zzzdzz'.gsub(/d/) { puts $~[0] }
=> 'd'
=> "zzzzz"
Users send in smses which must include a keyword. This keyword is then used to find a business.
The instruction is to use the keyword at the start of the sentence.
I know some users won't use the keyword at the beginning or will add tags (# # -) or punctuation (keyword.) to the keyword.
What is an efficient way to look for this keyword and for the business?
My attempt:
scrubbed_message = msg.gsub("\"", "").gsub("\'", "").gsub("#", "").gsub("-", "").gsub(",", "").gsub(".", "").gsub("#", "").split.join(" ")
tag = scrubbed_msg.split[0]
if #business = Business.where(tag: tag).first
log_message(#business)
else
scrubbed_msg.split.each do |w|
if #business = Business.where(tag: w).first
log_message(#business)
end
end
end
Instead of which characters you want to remove from the string, I suggest to use a whitelist approach specifying which characters you want to keep, for example alphanumeric characters:
sms = "#keyword and the rest"
clean_sms = sms.scan(/[\p{Alnum}]+/)
# => ["keyword", "and", "the", "rest"]
And then, if I got right what you are trying to do, to find the business you are looking for you could do something like this:
first_existing_tag = clean_sms.find do |tag|
Business.exists?(tag: tag)
end
#business = Business.where(tag: first_existing_tag).first
log_message(#business)
You can use Regexp match to filter all unnecessary characters out of the String, then use #reduce method on the Array git from splitted string to get the first occurience of a record with tag field matched to a keyword, in the exmaple: keyword, tag1, tag2:
msg = "key.w,ord tag-1'\n\"tag2"
# => "key.w,ord tag-1'\n\"tag2"
scrubbed = msg.gsub(/[#'"\-\.,#]/, "").split
# => ["keyword", "tag1", "tag2"]
#business = scrubbed.reduce(nil) do| sum, tag |
sum || Business.where(tag: tag).first
end
# => Record tag: keyword
# => Record tag: tag1 if on record with keyword found
I am just trying to figure out what the below means in Ruby.
"([^"]*)"$/
I have the following code sample in Ruby using cucumber at the moment:
require "watir-webdriver"
require "rspec/expectations"
Given /^I have entered "([^"]*)" into the query$/ do |term|
#browser ||= Watir::Browser.new :firefox
#browser.goto "google.com"
#browser.text_field(:name => "q").set term
end
When /^I click "([^"]*)"$/ do |button_name|
#browser.button.click
end
Then /^I should see some results$/ do
#browser.div(:id => "resultStats").wait_until_present
#browser.div(:id => "resultStats").should exist
#browser.close
end
I understand at the moment that it is doing a logic check that a button has been clicked. I did a bit of research around and found the following for symbal meanings in Ruby (as I am new to Ruby)
? = method returns a boolean value.
$ = global variable
# = instance variable
## = class variable.
^ = bitwise XOR operator.
* = unpack array
I cannot see to find what the command does. I am trying to clarify exactly how functions are linked to variables and I think this is the final clue for me.
Many thanks in advance for any help.
It's a regular expression. The expression is contained between the "/" characters.
By way of an example and using your code:
/^I have entered "([^"]*)" into the query$/
is interpreted as a string that :
Matches the beginning of the line (^)
Matches "I have entered"
Matches a single quote
(") Matches everything that is not a quote ( ([^"]*) )
Matches " into the query"
Matches a single quote (")
Matches the end of the line $
See http://www.tutorialspoint.com/ruby/ruby_regular_expressions.htm for more information on Ruby and Regular expressions.