Truncate sentences in rails? - ruby-on-rails

I've got a helper that I'm using to truncate strings in Rails, and it works great when I truncate sentences that end in periods. How should I modify the code to also truncate sentences when they end in question marks or exclamation points?
def smart_truncate(s, opts = {})
opts = {:words => 12}.merge(opts)
if opts[:sentences]
return s.split(/\.(\s|$)+/).reject{ |s| s.strip.empty? }[0, opts[:sentences]].map{|s| s.strip}.join('. ') + '...'
end
a = s.split(/\s/) # or /[ ]+/ to only split on spaces
n = opts[:words]
a[0...n].join(' ') + (a.size > n ? '... (more)' : '')
end
Thanks!!!

You have the truncate method
'Once upon a time in a world far far away'.truncate(27, separator: /\s/, ommission: "....")
which will return "Once upon a time in a..."
And if you need to truncate by number of words instead then use the newly introduced truncate_words (since Rails 4.2.2)
'And they found that many people were sleeping better.'.truncate_words(5, omission: '... (continued)')
which returns
"And they found that many... (continued)"

Related

Ruby regex puncuation

I am having trouble writing this so that it will take a sentence as an argument and perform the translation on each word without affecting the punctuation.
I'd also like to continue using the partition method.
It would be nice if I could have it keep a quote together as well, such as:
"I said this", I said.
would be:
"I aidsay histay", I said.
def convert_sentence_pig_latin(sentence)
p split_sentence = sentence.split(/\W/)
pig_latin_sentence = []
split_sentence.each do |word|
if word.match(/^[^aeiou]+/x)
pig_latin_sentence << word.partition(/^[^aeiou]+/x)[2] + word.partition(/^[^aeiou]+/x)[1] + "ay"
else
pig_latin_sentence << word
end
end
rejoined_pig_sentence = pig_latin_sentence.join(" ").downcase + "."
p rejoined_pig_sentence.capitalize
end
convert_sentence_pig_latin("Mary had a little lamb.")
Your main problem is that [^aeiou] matches every character outside that range, including spaces, commas, quotation marks, etc.
If I were you, I'd use a positive match for consonants, ie. [b-df-hj-np-tv-z] I would also put that regex in a variable, so you're not having to repeat it three times.
Also, in case you're interested, there's a way to make your convert_sentence_pig_latin method a single gsub and it will do the whole sentence in one pass.
Update
...because you asked...
sentence.gsub( /\b([b-df-hj-np-tv-z])(\w+)/i ) { "#{$2}#{$1}ay" }
# iterate over and replace regexp matches using gsub
def convert_sentence_pig_latin2(sentence)
r = /^[^aeiou]+/i
sentence.gsub(/"([^"]*)"/m) {|x| x.gsub(/\w+/) {|y| y =~ r ? "#{y.partition(r)[2]}#{y.partition(r)[1]}ay" : y}}
end
puts convert_sentence_pig_latin2('"I said this", I said.')
# define instance method: String#to_pl
class String
R = Regexp.new '^[^aeiou]+', true # => /^[^aeiou]+/i
def to_pl
self.gsub(/"([^"]*)"/m) {|x| x.gsub(/\w+/) {|y| y =~ R ? "#{y.partition(R)[2]}#{y.partition(R)[1]}ay" : y}}
end
end
puts '"I said this", I said.'.to_pl
sources:
http://www.ruby-doc.org/core-2.1.0/Regexp.html
http://ruby-doc.org/core-2.0/String.html#method-i-gsub

Newbie to ruby and a little stuck with if statements

this is most likely a dumb mistake however im a little stuck with setting some if conditions inside this
output = "<option value='#{user.id}' #{strDisabled} >if (user.company_name != "")#{user.company_name},&nbsp end if (user.name != "")#{user.name},&nbsp end if (user.email != "")#{user.email} end</option>".html_safe
It is just outputting the ruby code, i am most likely miles out but im on a project that uses ruby and i have not :)
Thanks in advance
Here's a quick refactor to make it easier to read the code.
output = "<option value='#{user.id}' #{strDisabled}>"
output << "#{user.company_name }, " if user.company_name.present?
output << "#{user.name}, " if user.name.present?
output << user.email if user.email.present?
output << "</option>"
output.html_safe
one more way to do it is
output = "<option value='#{user.id}' #{strDisabled}>"
output << ([user.company_name, user.name, user.email].reject(&:blank?) * ', ')
output << "</option>"
output.html_safe
the second line just selects which of the 3 strings is not blank and joins them with ,
Rails has a content_tag helper for generating HTML tags:
content_tag :option, value: user.id do
output = []
output << user.company_name unless user.company_name.blank?
output << user.email unless user.email.blank?
output.join(",&nbsp").html_safe
end
#=> "<option value=\"123\">user's company name, user's email</option>"
You need to put each piece that is Ruby code into #{} not just variables.
I don't think so that putting if inside a string is a good/proper thing to do.
I've attempted to reformat your code to approximate what it seems like you're trying to achieve:
output = "<option value='#{user.id}' #{strDisabled} >"
if (user.company_name != "")
output << "#{user.company_name}, "
end
if (user.name != "")
output << "#{user.name}, "
end
if (user.email != "")
output << "#{user.email}"
end
output << "</option>"
output = output.html_safe
You were missing numerous string terminators and semicolons for the non-breaking space HTML escape codes. But your major issue was that you were trying to do everything on the same line, and weren't ensuring your if/else constructs were
Outside the HTML string and
Properly formed.
Are you using an IDE, or are you attempting to do this in a text editor? If the latter, I highly recommend you attempt to at least find one which will do syntax highlighting, as this will make the issues you were suffering from a lot more visible.
you can write like this :
output = "<option value='#{user.id} #{strDisabled}'>
#{user.company_name if !user.company_name.blank?},
#{user.name if !user.name.blank?},
#{user.email if !user.email.blank?}
</option>".html_safe

How to pluralize "There is/are N object/objects"?

Pluralizing a single word is simple:
pluralize(#total_users, "user")
But what if I want to print "There is/are N user/users":
There are 0 users
There is 1 user
There are 2 users
, i.e., how to pluralize a sentence?
You can add a custom inflection for it. By default, Rails will add an inflections.rb to config/initializers. There you can add:
ActiveSupport::Inflector.inflections do |inflect|
inflect.irregular "is", "are"
end
You will then be able to use pluralize(#total_users, "is") to return is/are using the same rules as user/users.
EDIT: You clarified the question on how to pluralize a sentence. This is much more difficult to do generically, but if you want to do it, you'll have to dive into NLP.
As the comment suggests, you could do something with I18n if you just want to do it with a few sentences, you could build something like this:
def pluralize_sentence(count, i18n_id, plural_i18n_id = nil)
if count == 1
I18n.t(i18n_id, :count => count)
else
I18n.t(plural_i18n_id || (i18n_id + "_plural"), :count => count)
end
end
pluralize_sentence(#total_users, "user_count")
And in config/locales/en.yml:
en:
user_count: "There is %{count} user."
user_count_plural: "There are %{count} users."
This is probably best covered by the Rails i18n pluralization features. Adapted from http://guides.rubyonrails.org/i18n.html#pluralization
I18n.backend.store_translations :en, :user_msg => {
:one => 'There is 1 user',
:other => 'There are %{count} users'
}
I18n.translate :user_msg, :count => 2
# => 'There are 2 users'
I think the first part of Martin Gordon's answer is pretty good.
Alternatively, it's kind of messy but you can always just write the logic yourself:
"There #{#users.size == 1 ? 'is' : 'are'} #{#users.size} user#{'s' unless #users.size == 1}."
UPDATE to code: I no longer use the inflections route as stated in #Martin Gordon's answer. For some reason it would cause other non-related functions to error. I did extensive tests to confirm, though could not track down a reason why. So, below is now what I use and it works.
There are many ways to do this. This is how I did it using Rails 6.0.3.4 and Ruby 2.7.1.
I wanted to pluralize this sentence:
Singular: There is 1 private group
Plural: There are 2 private groups
What I did is I went to application_helper.rb and added this code:
def pluralize_private_statement(list, word)
num_in_list = list.count
is_or_are = num_in_list == 1 ? 'is' : 'are'
return "There " + is_or_are + " " + num_in_list.to_s + " private " + word.pluralize(num_in_list)
end
Now, all I have to use in my view is:
<%= pluralize_private_statement(private_groups, "group") %>
# private_groups = 2
# -> There are 2 private groups
What the code in application_helper.rb does is first create a variable for the number of items in the list passed and store it in num_in_list. Then it creates a second varible checking if the num_in_list is equal to 1 and if so returns 'is' otherwise it returns 'are'. Then, it returns the sentence that is constructed with the information obtained.
The first part of the sentence is a simple string, then the is_or_are variable which holds either 'is' or 'are' as explained above. Then it adds a space with the number of list items, converted from an integer to a string, followed by the 'private' word. Then it adds the pluralization of the word passed to the initial function; but, only returns the singular/plural word without a number attached as pluralize(#total_users, "is") would do.
Here is how you could use it for your specific question.
First, add this to your application_helper.rb file:
def pluralize_sentence(list, word)
num_in_list = list.count
is_or_are = num_in_list == 1 ? 'is' : 'are'
return "There " + is_or_are + " " + num_in_list.to_s + " " + word.pluralize(num_in_list)
end
Lastly, you can use this code wherever you wish to have the pluralized sentence:
<%= pluralize_sentence(#total_users, "user") %>
Happy Coding!

What's the fastest way to check if a word from one string is in another string?

I have a string of words; let's call them bad:
bad = "foo bar baz"
I can keep this string as a whitespace separated string, or as a list:
bad = bad.split(" ");
If I have another string, like so:
str = "This is my first foo string"
What's the fasted way to check if any word from the bad string is within my comparison string, and what's the fastest way to remove said word if it's found?
#Find if a word is there
bad.split(" ").each do |word|
found = str.include?(word)
end
#Remove the word
bad.split(" ").each do |word|
str.gsub!(/#{word}/, "")
end
If the list of bad words gets huge, a hash is a lot faster:
require 'benchmark'
bad = ('aaa'..'zzz').to_a # 17576 words
str= "What's the fasted way to check if any word from the bad string is within my "
str += "comparison string, and what's the fastest way to remove said word if it's "
str += "found"
str *= 10
badex = /\b(#{bad.join('|')})\b/i
bad_hash = {}
bad.each{|w| bad_hash[w] = true}
n = 10
Benchmark.bm(10) do |x|
x.report('regex:') {n.times do
str.gsub(badex,'').squeeze(' ')
end}
x.report('hash:') {n.times do
str.gsub(/\b\w+\b/){|word| bad_hash[word] ? '': word}.squeeze(' ')
end}
end
user system total real
regex: 10.485000 0.000000 10.485000 ( 13.312500)
hash: 0.000000 0.000000 0.000000 ( 0.000000)
bad = "foo bar baz"
=> "foo bar baz"
str = "This is my first foo string"
=> "This is my first foo string"
(str.split(' ') - bad.split(' ')).join(' ')
=> "This is my first string"
All the solutions have problems with catching the bad words if the case does not match. The regex solution is easiest to fix by adding the ignore-case flag:
badex = /\b(#{bad.split.join('|')})\b/i
In addition, using "String".include?(" String ") will lead to boundary problems with the first and last words in the string or strings where the target words have punctuation or are hyphenated. Testing for those situations will result in a lot of other code being needed. Because of that I think the regex solution is the best one. It is not the fastest but it is going to be more flexible right out of the box, and, if the other algorithms are tweaked to handle case folding and compound-words the regex solution might pull ahead.
#!/usr/bin/ruby
require 'benchmark'
bad = 'foo bar baz comparison'
badex = /\b(#{bad.split.join('|')})\b/i
str = "What's the fasted way to check if any word from the bad string is within my comparison string, and what's the fastest way to remove said word if it's found?" * 10
n = 10_000
Benchmark.bm(20) do |x|
x.report('regex:') do
n.times { str.gsub(badex,'').gsub(' ',' ') }
end
x.report('regex with squeeze:') do
n.times{ str.gsub(badex,'').squeeze(' ') }
end
x.report('array subtraction') do
n.times { (str.split(' ') - bad.split(' ')).join(' ') }
end
end
I made the str variable a lot longer, to make the routines work a bit harder.
user system total real
regex: 0.740000 0.010000 0.750000 ( 0.752846)
regex with squeeze: 0.570000 0.000000 0.570000 ( 0.581304)
array subtraction 1.430000 0.010000 1.440000 ( 1.449578)
Doh!, I'm too used to how other languages handle their benchmarks. Now I got it working and looking better!
Just a little comment about what it looks like the OP is trying to do: Black-listed word removal is easy to fool, and a pain to keep maintained. L33t-sp34k makes it trivial to sneek words through. Depending on the application, people will consider it a game to find ways to push offensive words past the filtering. The best solution I found when I was asked to work on this, was to create a generator that would create all the variations on a word and dump them into a database where some process could check as soon as possible, rather than in real time. A million small strings being checked can take a while if you are searching through a long list of offensive words; I'm sure we could come up with quite a list of things that someone would find offensive, but that's an exercise for a different day.
I haven't seen anything similar in Ruby to Perl's Regexp::Assemble, but that was a good way to go after this sort of problem. You can pass an array of words, plus options for case-folding and word-boundaries, and it will spit out a regex pattern that will match all the words, with their commonalities considered to result in the smallest pattern that will match all words in the list. The problem after that is locating which word in the original string matched the hits found by the pattern, so they can be removed. Differences in word case and hits within compound-words makes that replacement more interesting.
And we won't even go into words that are benign or offensive depending on the context.
I added a bit more comprehensive test for the array-subtraction benchmark, to fit how it would need to work in a real piece of code. The if clause is specified in the answer, this now reflects it:
#!/usr/bin/env ruby
require 'benchmark'
bad = 'foo bar baz comparison'
badex = /\b(#{bad.split.join('|')})\b/i
str = "What's the fasted way to check if any word from the bad string is within my comparison string, and what's the fastest way to remove said word if it's found?" * 10
str_split = str.split
bad_split = bad.split
n = 10_000
Benchmark.bm(20) do |x|
x.report('regex') do
n.times { str.gsub(badex,'').gsub(' ',' ') }
end
x.report('regex with squeeze') do
n.times{ str.gsub(badex,'').squeeze(' ') }
end
x.report('bad.any?') do
n.times {
if (bad_split.any? { |bw| str.include?(bw) })
(str_split - bad_split).join(' ')
end
}
end
x.report('array subtraction') do
n.times { (str_split - bad_split).join(' ') }
end
end
with two test runs:
ruby test.rb
user system total real
regex 1.000000 0.010000 1.010000 ( 1.001093)
regex with squeeze 0.870000 0.000000 0.870000 ( 0.873224)
bad.any? 1.760000 0.000000 1.760000 ( 1.762195)
array subtraction 1.350000 0.000000 1.350000 ( 1.346043)
ruby test.rb
user system total real
regex 1.000000 0.010000 1.010000 ( 1.004365)
regex with squeeze 0.870000 0.000000 0.870000 ( 0.868525)
bad.any? 1.770000 0.000000 1.770000 ( 1.775567)
array subtraction 1.360000 0.000000 1.360000 ( 1.359100)
I usually make a point of not optimizing without measurements, but here's a wag:
To make it fast, you should iterate through each string once. You want to avoid a loop with bad count * str count inner compares. So, you could build a big regexp and gsub with it.
(adding foo variants to test word boundary works)
str = "This is my first foo fooo ofoo string"
=> "This is my first foo fooo ofoo string"
badex = /\b(#{bad.split.join('|')})\b/
=> /\b(foo|bar|baz)\b/
str.gsub(badex,'').gsub(' ',' ')
=> "This is my first fooo ofoo string"
Of course the huge resulting regexp might be as slow as the implied nested iteration in my other answer. Only way to know is to measure.
bad = %w(foo bar baz)
str = "This is my first foo string"
# find the first word in the list
found = bad.find {|word| str.include?(word)}
# remove it
str[found] = '' ;# str => "This is my first string"
I'd benchmark this:
bad = "foo bar baz".split(' ')
str = "This is my first foo string".split(' ')
# 1. What's the fasted way to check if any word from the bad string is within my comparison string
p bad.any? { |bw| str.include?(bw) }
# 2. What's the fastest way to remove said word if it's found?
p (str - bad).join(' ')
any? will quick checking as soon as it sees a match. If you can order your bad words by their probability, you can save some cycles.
Here's one that will check for words and phrases.
def checkContent(str)
bad = ["foo", "bar", "this place sucks", "or whatever"]
# may be best to map and singularize everything as well.
# maybe add some regex to catch those pesky, "How i make $69 dollars each second online..."
# maybe apply some comparison stuff to check for weird characters in those pesky, "How i m4ke $69 $ollars an hour"
bad_hash = {}
bad_phrase_hash = {}
bad.map(&:downcase).each do |word|
words = word.split().map(&:downcase)
if words.length > 1
words.each do |inner|
if bad_hash.key?(inner)
if bad_hash[inner].is_a?(Hash) && !bad_hash[inner].key?(words.length)
bad_hash[inner][words.length] = true
elsif bad_hash[inner] === 1
bad_hash[inner] = {1=>true,words.length => true}
end
else
bad_hash[inner] = {words.length => true}
end
end
bad_phrase_hash[word] = true
else
bad_hash[word] = 1
end
end
string = str.split().map(&:downcase)
string.each_with_index do |word,index|
if bad_hash.key?(word)
if bad_hash[word].is_a?(Hash)
if bad_hash[word].key?(1)
return false
else
bad_hash[word].keys.sort.each do |length|
value = string[index...(index + length)].join(" ")
if bad_phrase_hash.key?(value)
return false
end
end
end
else
return false
end
end
end
return true
end
The include? method is what you need. The ruby String specificacion says:
str.include?( string ) -> true or false
Returns true if str contains the given string or character.
"hello".include? "lo" -> true
"hello".include? "ol" -> false
"hello".include? ?h -> true
Note that it has O(n) and what you purposed is O(n^2)

Overloading ActiveSupport's default to_sentence behaviour

ActiveSupport offers the nice method to_sentence. Thus,
require 'active_support'
[1,2,3].to_sentence # gives "1, 2, and 3"
[1,2,3].to_sentence(:last_word_connector => ' and ') # gives "1, 2 and 3"
it's good that you can change the last word connector, because I prefer not to have the extra comma. but it takes so much extra text: 44 characters instead of 11!
the question: what's the most ruby-like way to change the default value of :last_word_connector to ' and '?
Well, it's localizable so you could just specify a default 'en' value of ' and ' for support.array.last_word_connector
See:
from: conversion.rb
def to_sentence(options = {})
...
default_last_word_connector = I18n.translate(:'support.array.last_word_connector', :locale => options[:locale])
...
end
Step by step guide:
First, Create a rails project
rails i18n
Next, edit your en.yml file: vim config/locales/en.yml
en:
support:
array:
last_word_connector: " and "
Finally, it works:
Loading development environment (Rails 2.3.3)
>> [1,2,3].to_sentence
=> "1, 2 and 3"
As an answer to how to override a method in general, a post here gives a nice way of doing it. It doesn't suffer from the same problems as the alias technique, as there isn't a leftover "old" method.
Here how you could use that technique with your original problem (tested with ruby 1.9)
class Array
old_to_sentence = instance_method(:to_sentence)
define_method(:to_sentence) { |options = {}|
options[:last_word_connector] ||= " and "
old_to_sentence.bind(self).call(options)
}
end
You might also want read up on UnboundMethod if the above code is confusing. Note that old_to_sentence goes out of scope after the end statement, so it isn't a problem for future uses of Array.
class Array
alias_method :old_to_sentence, :to_sentence
def to_sentence(args={})
a = {:last_word_connector => ' and '}
a.update(args) if args
old_to_sentence(a)
end
end

Resources