I am dealing with a large quantity of data and I'm worried about the efficiency of my operations at-scale. After benchmarking, the average time to execute this string of code is about 0.004sec. The goal of this line of code is to find the difference between the two values in each array location. In a previous operation, 111.111 was loaded into the arrays in locations which contained invalid data. Due to some weird time domain issues, I needed to do this because I couldn't just remove the values and I needed some distinguishable placeholder. I could probably use 'nil' here instead. Anyways, back to the explanation. This line of code checks to ensure neither array has this 111.111 placeholder in the current location. If the values are valid then I perform the mathematical operation, otherwise I want to delete the values (or at least exclude them from the new array to which I'm writing). I accomplished this by place a 'nil' in that location and then compacting the array afterwards.
The time of 0.004sec for 4000 data points in each array isn't terrible but this line of code is executed 25M times. I'm hoping someone might be able to offer some insight into how I might optimize this line of code.
temp_row = row_1.zip(row_2).map do |x, y|
x == 111.111 || y == 111.111 ? nil : (x - y).abs
end.compact
You are wasting CPU generating nil in the ternary statement, then using compact to remove them. Instead, use reject or select to find elements not containing 111.111 then map or something similar.
Instead of:
row_1 = [1, 111.111, 2]
row_2 = [2, 111.111, 4]
temp_row = row_1.zip(row_2).map do |x, y|
x == 111.111 || y == 111.111 ? nil : (x - y).abs
end.compact
temp_row # => [1, 2]
I'd start with:
temp_row = row_1.zip(row_2)
.reject{ |x,y| x == 111.111 || y == 111.111 }
.map{ |x,y| (x - y).abs }
temp_row # => [1, 2]
Or:
temp_row = row_1.zip(row_2)
.each_with_object([]) { |(x,y), ary|
ary << (x - y).abs unless (x == 111.111 || y == 111.111)
}
temp_row # => [1, 2]
Benchmarking different size arrays shows good things to know:
require 'benchmark'
DECIMAL_SHIFT = 100
DATA_ARRAY = (1 .. 1000).to_a
ROW_1 = (DATA_ARRAY + [111.111]).shuffle
ROW_2 = (DATA_ARRAY.map{ |i| i * 2 } + [111.111]).shuffle
Benchmark.bm(16) do |b|
b.report('ternary:') do
DECIMAL_SHIFT.times do
ROW_1.zip(ROW_2).map do |x, y|
x == 111.111 || y == 111.111 ? nil : (x - y).abs
end.compact
end
end
b.report('reject:') do
DECIMAL_SHIFT.times do
ROW_1.zip(ROW_2).reject{ |x,y| x == 111.111 || y == 111.111 }.map{ |x,y| (x - y).abs }
end
end
b.report('each_with_index:') do
DECIMAL_SHIFT.times do
ROW_1.zip(ROW_2)
.each_with_object([]) { |(x,y), ary|
ary += [(x - y).abs] unless (x == 111.111 || y == 111.111)
}
end
end
end
# >> user system total real
# >> ternary: 0.240000 0.000000 0.240000 ( 0.244476)
# >> reject: 0.060000 0.000000 0.060000 ( 0.058842)
# >> each_with_index: 0.350000 0.000000 0.350000 ( 0.349363)
Adjust the size of DECIMAL_SHIFT and DATA_ARRAY and the placement of 111.111 and see what happens to get an idea of what expressions work best for your data size and structure and fine-tune the code as necessary.
You can try the parallel gem https://github.com/grosser/parallel and run it on multiple threads
Related
I have a following programm
def calc_res(a)
n = a.length
result = 0
for i in 0 .. (n - 1)
for j in i .. (n - 1)
if (a[i] != a[j] && j - i > result) then
result = j - i
end
end
end
return result
end
which return following output
irb(main):013:0> calc_res([4, 6, 2, 2, 6, 6, 4])
=> 5
but it is taking time if array size is too large e.g. [0,1,2,3,.....70000]
can any one suggest me how can I optimize it.
Thanks
If I have understood the problem you are trying to solve (from code)
def calc_res(a)
last_index = a.length - 1
index = 0
while a[index] == a.last do
index = index + 1
break if index == last_index
end
last_index - index
end
It checks items from start if they are equal to items from end, end it moves the index toward the last element. As I understood you search for max length between different elements.
For you problem with [4, 6, 2, 2, 6, 6, 4] it will have one iteration and return 5, for the problem with [1...70000] it will have zero iterations and will return the difference in positions for those two (size of the array - 1)
My understanding is that the problem is to find two unique elements in the array whose distance apart (difference in indices) is maximum, and to return the distance they are apart. I return nil if all elements are the same.
My solution attempts to minimize the numbers of pairs of elements that must be examined before an optimal solution is identified. For the example given in the question only two pairs of elements need be considered.
def calc_res(a)
sz = a.size-1
sz.downto(2).find { |n| (0..sz-n).any? { |i| a[i] != a[i+n] } }
end
a = [4,6,2,2,6,6,4]
calc_res a
#=> 5
If sz = a.size-1, sz is the greatest possible distance two elements can be apart. If, for example, a = [1,2,3,4], sz = 3, which is the number of positions 1 and 4 are apart.
For a, sz = a.size-1 #=> 6. I first determine if any pair of elements that are n = sz positions apart are unique. [a[0], a[6]] #=> [4,4] is the only pair of elements 6 positions apart. Since they are not unique I reduce n by one (to 5) and examine all pairs of elements n positions apart, looking for one whose elements are unique. There are two pairs 5 positions apart: [a[0], a[5]] #=> [4,6] and [a[1], a[6]] #=> [6,4]. Both of these meet the test, so we are finished, and return n #=> 5. In fact we are finished after testing the first of these two pairs. Had neither these pairs contained unique values n would have been reduced by 1 to 4 and the three pairs [a[0], a[4]] #=> [4,6], [a[1], a[5]] #=> [6,6] and [a[2], a[6]] #=> [2,6] would have been searched for one with unique values, and so on.
See Integer#downto, Enumerable#find and Enumerable#any?.
A more rubyesque versions include:
def calc_res(a)
last = a.last
idx = a.find_index {|e| e != last }&.+(1) || a.size
a.size - idx
end
def calc_res(a)
last = a.last
a.size - a.each.with_index(1).detect(->{[a.size]}) {|e,_| e != last }.last
end
def calc_res(a)
last = a.last
a.reduce(a.size) do |memo, e|
return memo unless e == last
memo -= 1
end
end
def calc_res(a)
return 0 if b = a.uniq and b.size == 1
a.size - a.index(b[-1]).+(1)
end
Looking to work with a dataset of strings that store money amounts in these formats. For example:
$217.3M
$1.6B
$34M
€1M
€2.8B
I looked at the money gem but it doesn't look like it handles the "M, B, k"'s back to numbers. Looking for a gem that does do that so I can convert exchange rates and compare quantities. I need the opposite of the number_to_human method.
I would start with something like this:
MULTIPLIERS = { 'k' => 10**3, 'm' => 10**6, 'b' => 10**9 }
def human_to_number(human)
number = human[/(\d+\.?)+/].to_f
factor = human[/\w$/].try(:downcase)
number * MULTIPLIERS.fetch(factor, 1)
end
human_to_number('$217.3M') #=> 217300000.0
human_to_number('$1.6B') #=> 1600000000.0
human_to_number('$34M') #=> 34000000.0
human_to_number('€1M') #=> 1000000.0
human_to_number('€2.8B') #=> 2800000000.0
human_to_number('1000') #=> 1000.0
human_to_number('10.88') #=> 10.88
I decided to not be lazy and actually write my own function if anyone else wants this:
def text_to_money(text)
returnarray = []
if (text.count('k') >= 1 || text.count('K') >= 1)
multiplier = 1000
elsif (text.count('M') >= 1 || text.count('m') >= 1)
multiplier = 1000000
elsif (text.count('B') >= 1 || text.count('b') >= 1)
multiplier = 1000000000
else
multiplier = 1
end
num = text.to_s.gsub(/[$,]/,'').to_f
total = num * multiplier
returnarray << [text[0], total]
return returnarray
end
Thanks for the help!
I'm in the process of learning Ruby, taking a Berkeley's MOOC, and, in some of these MOOC's homework we have an exercise that says:
Define a method sum_to_n? which takes an array of integers and an
additional integer, n, as arguments and returns true if any two
elements in the array of integers sum to n. An empty array should sum
to zero by definition.
I already created two methods that can do the job, but I'm not comfortable with any of them because I think they are not written in the Ruby Way. I hope some of you can help me to learn which would be the right way!
The first method I made uses the each method for both iterations, but what I don't like about this method is that every number is summed with every other number, even with the same number, doing something like this:
arr[1, 2, 3, 4] => 1+1, 1+2, 1+3, 1+4, 2+1, 2+2, 2+3, 2+4, 3+1, 3+2... 4+3, 4+4
As you can see, there's a lot of repeated sums, and I don't want that.
This is the code:
def sum_to_n?(arr, n)
arr.each {|x| arr.each {|y| return true if x + y == n && x != y}}
return true if n == 0 && arr.length == 0
return false
end
With the other method I got what I wanted, just a few sums without repeating any of them or even summing the same numbers, but it looks HORRIBLE, and I'm pretty sure someone would love to kill me for doing it this way, but the method does a great job as you can see:
arr[1, 2, 3, 4] => 1+2, 1+3, 1+4, 2+3, 2+4, 3+4
This is the code:
def sum_to_n?(arr, n)
for i in 0..arr.length - 1
k = i + 1
for k in k..arr.length - 1
sum = arr[i] + arr[k]
if sum == n
return true
end
end
end
return true if n == 0 && arr.length == 0
return false
end
Well, I hope you guys have fun doing a better and prettier method as I did trying.
Thank you for your help.
I'd write it like this:
def sum_to_n?(arr, n)
return true if arr.empty? && n.zero?
arr.combination(2).any? {|a, b| a + b == n }
end
That seems to be a pretty Rubyish solution.
I came across this on CodeWars. The accepted answer sure does look very Rubyish, but that is at the cost of performance. Calling arr.combination(2) results in a lot of combinations, it'd be simpler to go over the array element by element and search whether the 'complement' sum - element exists. Here's how that'd look like -
def sum_to_n?(arr, n)
(arr.empty? and n.zero?) or arr.any? { |x| arr.include?(n - x) }
end
Beside #jorg-w-mittag's answer. I found another solution using 'permutation'.
https://stackoverflow.com/a/19351660/66493
def sum_to_n?(arr, n)
(arr.empty? && n.zero?) || arr.permutation(2).any? { |a, b| a + b == n }
end
I didn't know about permutation before.
Still like #jorg-w-mittag answer because its more readable.
This one will do it in O(n.log(n)) rather than O(n²):
a = 1, 2, 3, 4
class Array
def sum_to? n
unless empty?
false.tap {
i, j, sorted = 0, size - 1, sort
loop do
break if i == j
a, b = sorted[i], sorted[j]
sum = a + b
return a, b if sum == n
sum < n ? i += 1 : j -= 1
end
}
end
end
end
a.sum_to? 7 #=> [3, 4]
I had a thought that the beginning of any answer to this question should probably start with pruning the array for superfluous data:
Can't use this:
arr.select! { |e| e <= n } # may be negative values
But this might help:
arr.sort!
while arr[0] + arr[-1] > n # while smallest and largest value > n
arr.delete_at(-1) # delete largest vaue
end
i wonder why no answers here using hash ?
def sum_to_n?(arr, n)
return true if arr.empty? && n.zero?
h = {}
arr.any? { |x| complement = h[n-x]; h[x] = true; complement }
end
puts sum_to_n?([1,2,3,4,5,7], 6) # true
puts sum_to_n?([6,2,3,5,7,9], 6) # false
puts sum_to_n?([3,4,5,3], 6) # true
puts sum_to_n?([3,4,5,7], 6) # false
puts sum_to_n?([], 6) # false
puts sum_to_n?([], 0) # true
I like rohitpaulk's answer but it fails when n doubles x. We should remove x from the array before sending include? n - x.
def sum_to_n?(arr, n)
return true if arr.empty? && n.zero?
arr.any? { |x| arr.tap { arr.delete_at arr.index x }.include? n - x }
end
Lam Phan's answer using a hash is the best
Trying to create a simple regular expression that can extract numbers(between 7 - 14) after a keyword starting with g letter and some id, something like following :
(g)(\d{1,6})\s+(\d{7,14}\s*)+
Lets assume :
m = (/(g)(\d{1,6})\s+(\d{7,14}\s*)+/i.match("g12 327638474 83873478 2387327683 44 437643673476"))
I have results of :
#<MatchData "g23333 327638474 83873478 2387327683 " "g" "12" "2387327683 ">
But what I need as a final result , to include, 327638474, 83873478, 2387327683 and exclude 44.
For now I just getting the last number 2387327683 with not including the previous numbers
Any help here .
cheers
Instead of a regex, you can use something like that:
s = "g12 327638474 83873478 2387327683 44 437643673476"
s.split[1..-1].select { |x| (7..14).include?(x.size) }.map(&:to_i)
# => [327638474, 83873478, 2387327683, 437643673476]
Just as a FYI, here is a benchmark showing a bit faster way of accomplishing the selected answer:
require 'ap'
require 'benchmark'
n = 100_000
s = "g12 327638474 83873478 2387327683 44 437643673476"
ap s.split[1..-1].select { |x| (7..14).include? x.size }.map(&:to_i)
ap s.split[1..-1].select { |x| 7 <= x.size && x.size <= 14 }.map(&:to_i)
Benchmark.bm(11) do |b|
b.report('include?' ) { n.times{ s.split[1..-1].select { |x| (7..14).include? x.size }.map(&:to_i) } }
b.report('conditional') { n.times{ s.split[1..-1].select { |x| 7 <= x.size && x.size <= 14 }.map(&:to_i) } }
end
ruby ~/Desktop/test.rb
[
[0] 327638474,
[1] 83873478,
[2] 2387327683,
[3] 437643673476
]
[
[0] 327638474,
[1] 83873478,
[2] 2387327683,
[3] 437643673476
]
user system total real
include? 1.010000 0.000000 1.010000 ( 1.011725)
conditional 0.830000 0.000000 0.830000 ( 0.825746)
For speed I'll use the conditional test. It's a tiny bit more verbose, but is still easily read.
User should insert all the values either positive or negative.
How may i set same sign validation ?
Right i have written this on before_save ..
unless (self.alt_1 >= 0 && self.alt_2 >=0 && self.alt_3 >= 0 &&
self.alt_4 >= 0 && self.alt_5 >= 0 && self.alt_6 >= 0) ||
(self.alt_1 <= 0 && self.alt_2 <=0 && self.alt_3 <= 0 &&
self.alt_4 <= 0 && self.alt_5 <= 0 && self.alt_6 <= 0)
self.errors.add_to_base(_("All values sign should be same."))
end
first_sign = self.alt_1 <=> 0
(2..6).each do |n|
unless (self.send("alt_#{n}") <=> 0) == first_sign
errors.add_to_base(_("All values' signs should be same."))
break
end
end
With this method we first get the sign of alt_1, and then see if the signs of the rest of the elements (alt_2 through alt_6) match. As soon as we find one that doesn't match we add the validation error and stop. It will run a maximum of 6 iterations and a minimum of 2.
Another more clever, but less efficient method, is to use the handy method Enumerable#all?, which returns true if the block passed to it returns true for all elements:
range = 1..6
errors.add_to_base(_("All values' signs should be same.")) unless
range.all? {|n| self.send("alt_#{n}") >= 0 } ||
range.all? {|n| self.send("alt_#{n}") <= 0 }
Here we first check if all of the elements are greater than 0 and then if all of the elements are less than 0. This method iterates a maximum of 12 times and a minimum of 6.
Here's a slightly different approach for you:
irb(main):020:0> def all_same_sign?(ary)
irb(main):021:1> ary.map { |x| x <=> 0 }.each_cons(2).all? { |x| x[0] == x[1] }
irb(main):022:1> end
=> nil
irb(main):023:0> all_same_sign? [1,2,3]
=> true
irb(main):024:0> all_same_sign? [1,2,0]
=> false
irb(main):025:0> all_same_sign? [-1, -5]
=> true
We use the spaceship operator to obtain the sign of each number, and we make sure that each element has the same sign as the element following it. You could also rewrite it to be more lazy by doing
ary.each_cons(2).all? { |x| (x[0] <=> 0) == (x[1] <=> 0) }
but that's less readable in my opinion.
unless
[:<=, :>=].any? do |check|
# Check either <= or >= for all values
[self.alt1, self.alt2, self.alt3, self.alt4, self.alt5, self.alt6].all? do |v|
v.send(check, 0)
end
end
self.errors.add_to_base(_("All values sign should be same."))
end