How to quickly join two strings in Ruby - ruby-on-rails

It's common to need to join strings, and in Ruby we have common ways of doing it: appending, concatenating and interpolating one into the other, or using the built-in concat method in String. (We have multiple ways of doing it for flexibility and to ease the transition from other languages to Ruby.)
Starting with:
'a ' << 'z' # => "a z"
'a '.concat('z') # => "a z"
'a ' + 'z' # => "a z"
"a #{'z'}" # => "a z"
Assuming we don't want to change either string and that the strings won't be assigned to variables, what is the fastest way to join them, and does the fastest way change as the size of the "left" string grows?
For those who can't figure out why we'd post questions like this, it's to help educate and show the most efficient way to do a particular task. Newcomers to a language, Ruby in this case, often drag old ways of doing something with them, and inadvertently write code that runs more slowly than necessary. A simple change to their coding style can result in faster code.

Starting with short strings:
z = 'z'
'a ' << z # => "a z"
'a '.concat(z) # => "a z"
'a ' + z # => "a z"
"a #{z}" # => "a z"
require 'fruity'
compare do
append { 'a ' << z}
concat { 'a '.concat(z)}
plus { 'a ' + z}
interpolate { "a #{z}" }
end
# >> Running each test 65536 times. Test will take about 2 seconds.
# >> interpolate is similar to append
# >> append is similar to plus
# >> plus is faster than concat by 2x ± 0.1
Increasing the "left" string to 11 characters:
require 'fruity'
compare do
append { 'abcdefghij ' << z}
concat { 'abcdefghij '.concat(z)}
plus { 'abcdefghij ' + z}
interpolate { "abcdefghij #{z}" }
end
# >> Running each test 65536 times. Test will take about 2 seconds.
# >> interpolate is similar to append
# >> append is similar to plus
# >> plus is faster than concat by 2x ± 1.0
51 characters:
compare do
append { 'abcdefghijabcdefghijabcdefghijabcdefghijabcdefghij ' << z}
concat { 'abcdefghijabcdefghijabcdefghijabcdefghijabcdefghij '.concat(z)}
plus { 'abcdefghijabcdefghijabcdefghijabcdefghijabcdefghij ' + z}
interpolate { "abcdefghijabcdefghijabcdefghijabcdefghijabcdefghij #{z}" }
end
# >> Running each test 32768 times. Test will take about 2 seconds.
# >> plus is faster than append by 2x ± 1.0
# >> append is similar to interpolate
# >> interpolate is similar to concat
101:
compare do
append { 'abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij ' << z}
concat { 'abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij '.concat(z)}
plus { 'abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij ' + z}
interpolate { "abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij #{z}" }
end
# >> Running each test 32768 times. Test will take about 2 seconds.
# >> plus is faster than interpolate by 2x ± 0.1
# >> interpolate is similar to append
# >> append is similar to concat
501:
compare do
append { 'abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij ' << z}
concat { 'abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij '.concat(z)}
plus { 'abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij ' + z}
interpolate { "abcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghijabcdefghij #{z}" }
end
# >> Running each test 16384 times. Test will take about 1 second.
# >> plus is faster than append by 2x ± 0.1
# >> append is similar to interpolate
# >> interpolate is similar to concat
Once the strings got past 50 characters + consistently outperformed the others.
In the comments there are mention of some of these mutating the string on the left. Here's what would happen if it was a variable on the left, not a literal string:
a = 'a'
z = 'z'
a << z # => "az"
a # => "az"
a = 'a'
a.concat(z) # => "az"
a # => "az"
compared to:
a + z # => "az"
a # => "a"
"#{a} #{z}" # => "a z"
a # => "a"
Note: The initial version of answer had a bad test using:
"a #{'z'}"
The problem with that is Ruby is smart enough to recognize that 'z' is another literal and converts the string into:
"a z"
with the end result that the test would be unfairly faster than the others.
It's been a while and Ruby's smarter and faster. I added a couple additional tests:
puts "Running Ruby v%s" % RUBY_VERSION
require 'fruity'
z_ = 'z'
compare do
append { 'abcdefghij ' << 'z' }
concat { 'abcdefghij '.concat('z') }
plus { 'abcdefghij ' + 'z' }
interpolate1 { "abcdefghij #{'z'}" }
interpolate2 { "abcdefghij #{z_}" }
adjacent { 'abcdefghij' ' z' }
end
# >> Running Ruby v2.7.0
# >> Running each test 65536 times. Test will take about 3 seconds.
# >> adjacent is similar to interpolate1
# >> interpolate1 is faster than interpolate2 by 2x ± 1.0
# >> interpolate2 is similar to append
# >> append is similar to concat
# >> concat is similar to plus
interpolate1 and adjacent are basically the same as far as the interpreter is concerned and will be concatenated prior to running. interpolate2 forces Ruby to do it at run-time so it's a little slower.

Related

How to Parse with Commas in CSV file in Ruby

I am parsing the CSV file with Ruby and am having trouble in that the delimiter is a comma my data contains commas.
In portions of the data that contain commas the data is surrounded by "" but I am not sure how to make CSV ignore commas that are contained within Quotations.
Example CSV Data (File.csv)
NCB 14591 BLK 13 LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
Example Code:
require 'csv'
CSV.foreach("File.csv", encoding:'iso-8859-1:utf-8', :quote_char => "\x00").each do |x|
puts x[1]
end
Current Output: " 84.07 FT OF 25
Expected Output: 84.07 FT OF 25, ALL OF 26,
Link to the gist to view the example file and code.
https://gist.github.com/markscoin/0d6c2d346d70fd627203317c5fe3097c
Try with force_quotes option:
require 'csv'
CSV.foreach("data.csv", encoding:'iso-8859-1:utf-8', quote_char: '"', force_quotes: true).each do |x|
puts x[1]
end
Result:
84.07 FT OF 25, ALL OF 26,
The illegal quoting error is when a line has quotes, but they don't wrap the entire column, so for instance if you had a CSV that looks like:
NCB 14591 BLK 13 LOT W IRR," 84.07 FT OF 25, ALL OF 26,",TWENTY-THREE SAC HOLDING COR
NCB 14592 BLK 14 LOT W IRR,84.07 FT OF "25",TWENTY-FOUR SAC HOLDING COR
You could parse each line individually and change the quote character only for the lines that use bad quoting:
require 'csv'
def parse_file(file_name)
File.foreach(file_name) do |line|
parse_line(line) do |x|
puts x.inspect
end
end
end
def parse_line(line)
options = { encoding:'iso-8859-1:utf-8' }
begin
yield CSV.parse_line(line, options)
rescue CSV::MalformedCSVError
# this line is misusing quotes, change the quote character and try again
options.merge! quote_char: "\x00"
retry
end
end
parse_file('./File.csv')
and running this gives you:
["NCB 14591 BLK 13 LOT W IRR", " 84.07 FT OF 25, ALL OF 26,", "TWENTY-THREE SAC HOLDING COR"]
["NCB 14592 BLK 14 LOT W IRR", "84.07 FT OF \"25\"", "TWENTY-FOUR SAC HOLDING COR"]
but then if you have a mix of bad quoting and good quoting in a single row this falls apart again. Ideally you just want to clean up the CSV to be valid.

How to Calculate sum of all the digits in text file

I am having text file t.txt,I want to calculate sum of all the digits in text file
Example
--- t.txt ---
The rahul jumped in 2 the well. The water was cold at 1 degree Centigrade. There were 3 grip holes on the walls. The well was 17 feet deep.
--- EOF --
sum 2+1+3+1+7
My ruby code to calculate sum is
ruby -e "File.read('t.txt').split.inject(0){|mem, obj| mem += obj.to_f}"
But i am not getting any answer??
str = "The rahul jumped in 2 the well. The water was cold at 1 degree Centigrade. There were 3 grip holes on the walls. The well was 17 feet deep."
To get sum of all integers:
str.scan(/\d+/).sum(&:to_i)
# => 23
Or to get sum of all digits as in your example:
str.scan(/\d+?/).sum(&:to_i)
# => 14
PS: I used sum seeing Rails tag. If you are only using Ruby you can use inject instead.
Example with inject
str.scan(/\d/).inject(0) { |sum, a| sum + a.to_i }
# => 14
str.scan(/\d+/).inject(0) { |sum, a| sum + a.to_i }
# => 23
Your statement is computing correctly. Just add puts before File read as:
ruby -e "puts File.read('t.txt').split.inject(0){|mem, obj| mem += obj.to_f}"
# => 23.0
For summing single digit only:
ruby -e "puts File.read('t.txt').scan(/\d/).inject(0){|mem, obj| mem += obj.to_f}"
# => 14.0
Thanks

Regex for striping non alphabetic and non numeric characters

Total programming newbie here. In ruby, how would I go about striping the following string of non alphabetic and non numeric characters and then split the string into an array by splitting it through spaces.
Example
string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
Into this
tokenized_string = ["Honey", "a", "sweet", "sticky", "yellow", "fluid", "made", "by", "bees", "and", "other", "insects", "from", "nectar", "collected", "from", "flowers"]
Any help would be much appreciated!
I'd use:
string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
string.delete('^A-Za-z0-9 ').split
# => ["Honey",
# "a",
# "sweet",
# "sticky",
# "yellow",
# "fluid",
# "made",
# "by",
# "bees",
# "and",
# "other",
# "insects",
# "from",
# "nectar",
# "collected",
# "from",
# "flowers"]
If you're trying to remove everything but alphanumerics, then the \w character class can't be used because it is defined as [A-Za-z0-9_], which allows _ to leak in or squeeze through. Here's an example:
'foo_BAR12'[/\w+/] # => "foo_BAR12"
That matched the entire string, including _.
'foo_BAR12'[/[A-Za-z0-9]+/] # => "foo"
That stopped at _, because the class [A-Za-z0-9] doesn't include it.
\w should be considered a matching pattern for variable names, NOT for alphanumerics. If you want a character class for alphanumerics, look at the POSIX \[\[:alnum:\]\] class:
'foo_BAR12'[/[[:alnum:]]+/] # => "foo"
There are a lot of possibilities, e.g.:
string.gsub(/\W/) { |m| m if m == ' ' }.split
or, even clearer:
string.gsub(/\W/) { |m| m if m.strip.empty? }.split
Very simple. The following gives you the array you want without your having to use split:
string.scan(/\w+/)
Play around with it on Rubular.com.
Do as belowe using String#scan
string = "Honey - a sweet, sticky, yellow fluid made by bees and other insects from nectar collected from flowers."
string.scan(/[a-zA-Z0-9]+/)
# => ["Honey",
# "a",
# "sweet",
# "sticky",
# "yellow",
# "fluid",
# "made",
# "by",
# "bees",
# "and",
# "other",
# "insects",
# "from",
# "nectar",
# "collected",
# "from",
# "flowers"]

Map string to another string in Ruby Rails

Hey guys,
i have 5 model attributes, for example, 'str' and 'dex'. A user has strength, dexterity attribute.
When i call user.increase_attr('dex') i want to do it through 'dex' and not having to pass 'dexterity' string all the way.
Of course, i can just check if ability == 'dex' and convert it to 'dexterity' when i will need to do user.dexterity += 1 and then save it.
But what is a good ruby way to do that ?
Look at Ruby's Abbrev module that's part of the standard library. This should give you some ideas.
require 'abbrev'
require 'pp'
class User
def increase_attr(s)
"increasing using '#{s}'"
end
end
abbreviations = Hash[*Abbrev::abbrev(%w[dexterity strength speed height weight]).flatten]
user = User.new
user.increase_attr(abbreviations['dex']) # => "increasing using 'dexterity'"
user.increase_attr(abbreviations['s']) # => "increasing using ''"
user.increase_attr(abbreviations['st']) # => "increasing using 'strength'"
user.increase_attr(abbreviations['sp']) # => "increasing using 'speed'"
If an ambiguous value is passed in, (the "s"), nothing will match. If a unique value is found in the hash, the returned value is the full string, making it easy to map short strings to the full string.
Because having varying lengths of the trigger strings would be confusing to the user you could strip all elements of the hash that have keys shorter than the shortest unambiguous key. In other words, remove anything shorter than two characters because of the collision of "speed" ("sp") and "strength" ("st"), meaning "h", "d" and "w" need to go. It's a "be kind to the poor human users" thing.
Here's what is created when Abbrev::abbrev does its magic and it's coerced into a Hash.
pp abbreviations
# >> {"dexterit"=>"dexterity",
# >> "dexteri"=>"dexterity",
# >> "dexter"=>"dexterity",
# >> "dexte"=>"dexterity",
# >> "dext"=>"dexterity",
# >> "dex"=>"dexterity",
# >> "de"=>"dexterity",
# >> "d"=>"dexterity",
# >> "strengt"=>"strength",
# >> "streng"=>"strength",
# >> "stren"=>"strength",
# >> "stre"=>"strength",
# >> "str"=>"strength",
# >> "st"=>"strength",
# >> "spee"=>"speed",
# >> "spe"=>"speed",
# >> "sp"=>"speed",
# >> "heigh"=>"height",
# >> "heig"=>"height",
# >> "hei"=>"height",
# >> "he"=>"height",
# >> "h"=>"height",
# >> "weigh"=>"weight",
# >> "weig"=>"weight",
# >> "wei"=>"weight",
# >> "we"=>"weight",
# >> "w"=>"weight",
# >> "dexterity"=>"dexterity",
# >> "strength"=>"strength",
# >> "speed"=>"speed",
# >> "height"=>"height",
# >> "weight"=>"weight"}
def increase_attr(attr)
attr_map = {'dex' => :dexterity, 'str' => :strength}
increment!(attr_map[attr]) if attr_map.include?(attr)
end
Basically create a Hash that has the key of 'dex', 'str' etc and points to the expanded version of that word(in symbol format).

Is it possible to simulate the behaviour of sprintf("%g") using the Rails NumberHelper methods?

sprintf("%g", [float]) allows me to format a floating point number without specifying precision, such that 10.00 is rendered as 10 and 10.01 is rendered as 10.01, and so on. This is neat.
In my application I'm rendering numbers using the Rails NumberHelper methods so that I can take advantage of the localization features, but I can't figure out how to achieve the above functionality through these helpers since they expect an explicit :precision option.
Is there a simple way around this?
Why not just use Ruby's Kernel::sprintf with NumberHelper? Recommended usage with this syntax: str % arg where str is the format string (%g in your case):
>> "%g" % 10.01
=> "10.01"
>> "%g" % 10
=> "10"
Then you can use the NumberHelper to print just the currency symbol:
>> foo = ActionView::Base.new
>> foo.number_to_currency(0, :format => "%u") + "%g"%10.0
=> "$10"
and define your own convenience method:
def pretty_currency(val)
number_to_currency(0, :format => "%u") + "%g"%val
end
pretty_currency(10.0) # "$10"
pretty_currency(10.01) # "$10.01"
I have solved this by adding another method to the NumberHelper module as follows:
module ActionView
module Helpers #:nodoc:
module NumberHelper
# Formats a +number+ such that the the level of precision is determined using the logic of sprintf("%g%"), that
# is: "Convert a floating point number using exponential form if the exponent is less than -4 or greater than or
# equal to the precision, or in d.dddd form otherwise."
# You can customize the format in the +options+ hash.
#
# ==== Options
# * <tt>:separator</tt> - Sets the separator between the units (defaults to ".").
# * <tt>:delimiter</tt> - Sets the thousands delimiter (defaults to "").
#
# ==== Examples
# number_with_auto_precision(111.2345) # => "111.2345"
# number_with_auto_precision(111) # => "111"
# number_with_auto_precision(1111.2345, :separator => ',', :delimiter => '.') # "1,111.2345"
# number_with_auto_precision(1111, :separator => ',', :delimiter => '.') # "1,111"
def number_with_auto_precision(number, *args)
options = args.extract_options!
options.symbolize_keys!
defaults = I18n.translate(:'number.format', :locale => options[:locale], :raise => true) rescue {}
separator ||= (options[:separator] || defaults[:separator])
delimiter ||= (options[:delimiter] || defaults[:delimiter])
begin
number_with_delimiter("%g" % number,
:separator => separator,
:delimiter => delimiter)
rescue
number
end
end
end
end
end
It is the specific call to number_with_delimiter with the %g option which renders the number as described in the code comments above.
This works great for me, but I'd welcome thoughts on this solution.

Resources