Ruby: truncating a long string contained in another string

Ruby: truncating a long string contained in another string - ruby-on-rails

Let string_a = "I'm a string but this aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa is long."
How is it possible to detect the aaaa...aaa (the part without spaces that is very long) part and truncate it so that the result looks like (the rest of the string should look the same):
puts string_a # => "I'm a string but this aaa....aaa is long."
I use this method to truncate:
def truncate_string(string_x)
splits = string_x.scan(/(.{0,9})(.*?)(.{0,9}$)/)[0]
splits[1] = "..." if splits[1].length > 3
splits.join
end
So running:
puts truncate_string("I'm a very long long long string") # => "I'm a ver...ng string"
The problem is detecting the 'aaaa...aaaa' and apply truncate_string to it.
First part of the solution? Detecting strings that are longer than N using regex?

What about something like
string.gsub(/\S{10,}/) { |x| "#{x[0..3]}...#{x[-3..-1]}" }
where 10 is the maximum length of a word?

how do you like this?
s = "I'm a string but this aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa is long."
s.gsub(/(.* \w{3})\w{5,}(\w{3}.*)/, '\1...\2')
=> "I'm a string but this aaa...aaa is long."

Related

When trying to multiply with a method I get this output ஸ

Ruby Noob here - Trying to make a very simple appointment booking form that outputs a confirmation and the amount of time the appointment will take. I've gotten the concatenation working on the output but I keep getting this character (ஸ) where the amount of time should be. Below is my ruby document and the output.
print "Whats your name?"
name = gets.to_s
print "What is the address for your listing?"
appointment_address = gets
print "Square footage?"
sq_ft = gets.to_i
print "listing price"
listing_price = gets
# PHOTOGRAPHERS
def tps
tps = 3.to_i
end
def ryan(sq_ft,tps)
p sq_ft.to_i * tps.to_i
end
appointment_confirmation = 'Hey, '<< name.to_s.strip << '! Your appointment at ' << appointment_address.to_s.strip << ' will take us about ' << sq_ft*tps << ' to complete.'
p appointment_confirmation.strip
Output:
Hey, Alex! Your appointment at 102 Alex will take us about ஸ to complete.

When you add an integer to a string, the integer is converted into its corresponding character code. Cast your integer to a string:
' will take us about ' << (sq_ft*tps).to_s << ' to com ...'

You are appending an integer to a string. The integer is treated as character code.
The following prints our the letters A-Z because 65 is the code of "A" and 90 the code of "Z".
s = ""
(65..90).each do |ascii_code|
s << ascii_code
end
puts s
Either convert the amount to a string or (and this is my preferred way) use string interpolation:
"Hey, #{name} the duration is #{sq_ft*tps}"
Where #{} can be used to interpolate ruby values into a string. This only works with double quoted strings.

Capitalize First Letter of all Words and Keep Already Capitalized

Using rails 4, and having trouble finding documentation on this. I would like to capitalize the first letter of each word in a string but keep already capitalized letters.
I would like the following outputs:
how far is McDonald's from here? => How Far Is McDonald's From Here?
MDMA is also known as molly => MDMA Is Also Known As Molly
i drive a BMW => I Drive A BMW
I thought .titleize would do it, but that will turn BMW into Bmw. Thank you for any help.

You can try the following:
a.split.map{|x| x.slice(0, 1).capitalize + x.slice(1..-1)}.join(' ')
# or
a.split.map{|x| x[0].upcase + x[1..-1]}.join(' ')
#=> ["MDMA Is Also Known As Molly",
"How Far Is McDonald's From Here?",
"I Drive A BMW"]
Demonstration

You can do a custom method like this:
string = "your string IS here"
output = []
string.split(' ').each do |word|
if word =~ /[A-Z]/
output << word
else
output << word.capitalize
end
end
output.join(' ')
Of course, this will not change a word like "tEST" or "tEst" because it has at least one capital letter in it.

to capitalize only the first letter while preserving existing capitalization:
your_string.then { |s| s[0].upcase + s[1..-1] }

What's the fastest way to check if a word from one string is in another string?

I have a string of words; let's call them bad:
bad = "foo bar baz"
I can keep this string as a whitespace separated string, or as a list:
bad = bad.split(" ");
If I have another string, like so:
str = "This is my first foo string"
What's the fasted way to check if any word from the bad string is within my comparison string, and what's the fastest way to remove said word if it's found?
#Find if a word is there
bad.split(" ").each do |word|
found = str.include?(word)
end
#Remove the word
bad.split(" ").each do |word|
str.gsub!(/#{word}/, "")
end

If the list of bad words gets huge, a hash is a lot faster:
require 'benchmark'
bad = ('aaa'..'zzz').to_a # 17576 words
str= "What's the fasted way to check if any word from the bad string is within my "
str += "comparison string, and what's the fastest way to remove said word if it's "
str += "found"
str *= 10
badex = /\b(#{bad.join('|')})\b/i
bad_hash = {}
bad.each{|w| bad_hash[w] = true}
n = 10
Benchmark.bm(10) do |x|
x.report('regex:') {n.times do
str.gsub(badex,'').squeeze(' ')
end}
x.report('hash:') {n.times do
str.gsub(/\b\w+\b/){|word| bad_hash[word] ? '': word}.squeeze(' ')
end}
end
user system total real
regex: 10.485000 0.000000 10.485000 ( 13.312500)
hash: 0.000000 0.000000 0.000000 ( 0.000000)

bad = "foo bar baz"
=> "foo bar baz"
str = "This is my first foo string"
=> "This is my first foo string"
(str.split(' ') - bad.split(' ')).join(' ')
=> "This is my first string"

All the solutions have problems with catching the bad words if the case does not match. The regex solution is easiest to fix by adding the ignore-case flag:
badex = /\b(#{bad.split.join('|')})\b/i
In addition, using "String".include?(" String ") will lead to boundary problems with the first and last words in the string or strings where the target words have punctuation or are hyphenated. Testing for those situations will result in a lot of other code being needed. Because of that I think the regex solution is the best one. It is not the fastest but it is going to be more flexible right out of the box, and, if the other algorithms are tweaked to handle case folding and compound-words the regex solution might pull ahead.
#!/usr/bin/ruby
require 'benchmark'
bad = 'foo bar baz comparison'
badex = /\b(#{bad.split.join('|')})\b/i
str = "What's the fasted way to check if any word from the bad string is within my comparison string, and what's the fastest way to remove said word if it's found?" * 10
n = 10_000
Benchmark.bm(20) do |x|
x.report('regex:') do
n.times { str.gsub(badex,'').gsub(' ',' ') }
end
x.report('regex with squeeze:') do
n.times{ str.gsub(badex,'').squeeze(' ') }
end
x.report('array subtraction') do
n.times { (str.split(' ') - bad.split(' ')).join(' ') }
end
end
I made the str variable a lot longer, to make the routines work a bit harder.
user system total real
regex: 0.740000 0.010000 0.750000 ( 0.752846)
regex with squeeze: 0.570000 0.000000 0.570000 ( 0.581304)
array subtraction 1.430000 0.010000 1.440000 ( 1.449578)
Doh!, I'm too used to how other languages handle their benchmarks. Now I got it working and looking better!
Just a little comment about what it looks like the OP is trying to do: Black-listed word removal is easy to fool, and a pain to keep maintained. L33t-sp34k makes it trivial to sneek words through. Depending on the application, people will consider it a game to find ways to push offensive words past the filtering. The best solution I found when I was asked to work on this, was to create a generator that would create all the variations on a word and dump them into a database where some process could check as soon as possible, rather than in real time. A million small strings being checked can take a while if you are searching through a long list of offensive words; I'm sure we could come up with quite a list of things that someone would find offensive, but that's an exercise for a different day.
I haven't seen anything similar in Ruby to Perl's Regexp::Assemble, but that was a good way to go after this sort of problem. You can pass an array of words, plus options for case-folding and word-boundaries, and it will spit out a regex pattern that will match all the words, with their commonalities considered to result in the smallest pattern that will match all words in the list. The problem after that is locating which word in the original string matched the hits found by the pattern, so they can be removed. Differences in word case and hits within compound-words makes that replacement more interesting.
And we won't even go into words that are benign or offensive depending on the context.
I added a bit more comprehensive test for the array-subtraction benchmark, to fit how it would need to work in a real piece of code. The if clause is specified in the answer, this now reflects it:
#!/usr/bin/env ruby
require 'benchmark'
bad = 'foo bar baz comparison'
badex = /\b(#{bad.split.join('|')})\b/i
str = "What's the fasted way to check if any word from the bad string is within my comparison string, and what's the fastest way to remove said word if it's found?" * 10
str_split = str.split
bad_split = bad.split
n = 10_000
Benchmark.bm(20) do |x|
x.report('regex') do
n.times { str.gsub(badex,'').gsub(' ',' ') }
end
x.report('regex with squeeze') do
n.times{ str.gsub(badex,'').squeeze(' ') }
end
x.report('bad.any?') do
n.times {
if (bad_split.any? { |bw| str.include?(bw) })
(str_split - bad_split).join(' ')
end
}
end
x.report('array subtraction') do
n.times { (str_split - bad_split).join(' ') }
end
end
with two test runs:
ruby test.rb
user system total real
regex 1.000000 0.010000 1.010000 ( 1.001093)
regex with squeeze 0.870000 0.000000 0.870000 ( 0.873224)
bad.any? 1.760000 0.000000 1.760000 ( 1.762195)
array subtraction 1.350000 0.000000 1.350000 ( 1.346043)
ruby test.rb
user system total real
regex 1.000000 0.010000 1.010000 ( 1.004365)
regex with squeeze 0.870000 0.000000 0.870000 ( 0.868525)
bad.any? 1.770000 0.000000 1.770000 ( 1.775567)
array subtraction 1.360000 0.000000 1.360000 ( 1.359100)

I usually make a point of not optimizing without measurements, but here's a wag:
To make it fast, you should iterate through each string once. You want to avoid a loop with bad count * str count inner compares. So, you could build a big regexp and gsub with it.
(adding foo variants to test word boundary works)
str = "This is my first foo fooo ofoo string"
=> "This is my first foo fooo ofoo string"
badex = /\b(#{bad.split.join('|')})\b/
=> /\b(foo|bar|baz)\b/
str.gsub(badex,'').gsub(' ',' ')
=> "This is my first fooo ofoo string"
Of course the huge resulting regexp might be as slow as the implied nested iteration in my other answer. Only way to know is to measure.

bad = %w(foo bar baz)
str = "This is my first foo string"
# find the first word in the list
found = bad.find {|word| str.include?(word)}
# remove it
str[found] = '' ;# str => "This is my first string"

I'd benchmark this:
bad = "foo bar baz".split(' ')
str = "This is my first foo string".split(' ')
# 1. What's the fasted way to check if any word from the bad string is within my comparison string
p bad.any? { |bw| str.include?(bw) }
# 2. What's the fastest way to remove said word if it's found?
p (str - bad).join(' ')
any? will quick checking as soon as it sees a match. If you can order your bad words by their probability, you can save some cycles.

Here's one that will check for words and phrases.
def checkContent(str)
bad = ["foo", "bar", "this place sucks", "or whatever"]
# may be best to map and singularize everything as well.
# maybe add some regex to catch those pesky, "How i make $69 dollars each second online..."
# maybe apply some comparison stuff to check for weird characters in those pesky, "How i m4ke $69 $ollars an hour"
bad_hash = {}
bad_phrase_hash = {}
bad.map(&:downcase).each do |word|
words = word.split().map(&:downcase)
if words.length > 1
words.each do |inner|
if bad_hash.key?(inner)
if bad_hash[inner].is_a?(Hash) && !bad_hash[inner].key?(words.length)
bad_hash[inner][words.length] = true
elsif bad_hash[inner] === 1
bad_hash[inner] = {1=>true,words.length => true}
end
else
bad_hash[inner] = {words.length => true}
end
end
bad_phrase_hash[word] = true
else
bad_hash[word] = 1
end
end
string = str.split().map(&:downcase)
string.each_with_index do |word,index|
if bad_hash.key?(word)
if bad_hash[word].is_a?(Hash)
if bad_hash[word].key?(1)
return false
else
bad_hash[word].keys.sort.each do |length|
value = string[index...(index + length)].join(" ")
if bad_phrase_hash.key?(value)
return false
end
end
end
else
return false
end
end
end
return true
end

The include? method is what you need. The ruby String specificacion says:
str.include?( string ) -> true or false
Returns true if str contains the given string or character.
"hello".include? "lo" -> true
"hello".include? "ol" -> false
"hello".include? ?h -> true
Note that it has O(n) and what you purposed is O(n^2)

Ruby equivalent of PHP's ucfirst() function

What's the best way in Ruby (with Rails, if relevant) to capitalize the first letter of a string?
Note that String#capitalize is not what I want since, in addition to capitalizing the first letter of the string, this function makes all other characters lowercase (which I don't want -- I'd like to leave them untouched):
>> "a A".capitalize
=> "A a"

In Rails you have the String#titleize method:
"testing string titleize method".titleize #=> "Testing String Titleize Method"

You can use "sub" to get what you want (note: I haven't tested this with multibyte strings)
"a A".sub(/^(\w)/) {|s| s.capitalize}
(and you can of course monkeypatch String to add this as a method if you like)

Upper case the first char, and save it back into the string
s = "a A"
s[0] = s[0,1].upcase
p s # => "A A"
Or,
class String
def ucfirst!
self[0] = self[0,1].upcase
self
end
end

If you don't want to modify the original string, you can do it this way:
class String
def ucfirst
str = self.clone
str[0] = str[0,1].upcase
str
end
end

I propose the following solution, works through whitespace
' ucfirstThis'.sub(/\w/, &:capitalize)
# => "UcfirstThis"

Since rails 5:
"a A".upcase_first
=> "A A"
http://api.rubyonrails.org/v5.1/classes/ActiveSupport/Inflector.html#method-i-upcase_first

If you are looking for a real similar function to PHPs ucfirst() try
"a A".gsub(/(\w+)/) {|s| s.capitalize}
will result in "A A".
"a neW APPROACH".gsub(/(\w+)/) {|s| s.capitalize}
will result in "A New Approach".
You can extend String class with:
class String
def ucfirst
self.gsub(/(\w+)/) { |s| s.capitalize }
end
def ucfirst!
self.gsub!(/(\w+)/) { |s| s.capitalize }
end
end

Have a look at this.
capitalizing-first-letter-of-each-word
There's not an inbuilt function. You need to split the letters and rejoin or try Rails' String#titleize and see if it does what you want.

That one liner does not depend on ActiveSupport. Not sure it's totally bulletproof though:
"my great uncle and grand-ma".gsub(/(\A\w|\s\w)/) { |m| m.upcase }
# My Great Uncle And Grand-ma

Strip method for non-whitespace characters?

Is there a Ruby/Rails function that will strip a string of a certain user-defined character? For example if I wanted to strip my string of quotation marks "... text... "
http://api.rubyonrails.org/classes/ActiveSupport/Multibyte/Chars.html#M000942

I don't know if I'm reinventing the wheel here so if you find a built-in method that does the same, please let me know :-)
I added the following to config/initializers/string.rb , which add the trim, ltrim and rtrim methods to the String class.
# in config/initializers/string.rb
class String
def trim(str=nil)
return self.ltrim(str).rtrim(str)
end
def ltrim(str=nil)
if (!str)
return self.lstrip
else
escape = Regexp.escape(str)
end
return self.gsub(/^#{escape}+/, "")
end
def rtrim(str=nil)
if (!str)
return self.rstrip
else
escape = Regexp.escape(str)
end
return self.gsub(/#{escape}+$/, "")
end
end
and I use it like this:
"... hello ...".trim(".") => " hello "
and
"\"hello\"".trim("\"") => "hello"
I hope this helps :-)

You can use tr with the second argument as a blank string. For example:
%("... text... ").tr('"', '')
would remove all the double quotes.
Although if you are using this function to sanitize your input or output then it will probably not be effective at preventing SQL injection or Cross Site Scripting attacks. For HTML you are better off using the gem sanitize or the view helper function h.

I don't know of one out of the box, but this should do what you want:
class String
def strip_str(str)
gsub(/^#{str}|#{str}$/, '')
end
end
a = '"Hey, there are some extraneous quotes in this here "String"."'
puts a.strip_str('"') # -> Hey, there are some extraneous quotes in this here "String".

You could use String#gsub:
%("... text... ").gsub(/\A"+|"+\Z/,'')

class String
# Treats str as array of char
def stripc(str)
out = self.dup
while str.each_byte.any?{|c| c == out[0]}
out.slice! 0
end
while str.each_byte.any?{|c| c == out[-1]}
out.slice! -1
end
out
end
end
Chuck's answer needs some + signs if you want to remove all extra instances of his string pattern. And it doesn't work if you want to remove any of a set of characters that might appear in any order.
For instance, if we want a string to not end with any of the following: a, b, c, and our string is fooabacab, we need something stronger like the code I've supplied above.

Categories

HOME

machine-learning

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Ruby: truncating a long string contained in another string - ruby-on-rails

What about something like string.gsub(/\S{10,}/) { |x| "#{x[0..3]}...#{x[-3..-1]}" } where 10 is the maximum length of a word?

how do you like this? s = "I'm a string but this aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa is long." s.gsub(/(.* \w{3})\w{5,}(\w{3}.*)/, '\1...\2') => "I'm a string but this aaa...aaa is long."

Related

When trying to multiply with a method I get this output ஸ

Capitalize First Letter of all Words and Keep Already Capitalized

What's the fastest way to check if a word from one string is in another string?

Ruby equivalent of PHP's ucfirst() function

Strip method for non-whitespace characters?

Categories

Resources