Detecting overlapping ranges in Ruby - ruby-on-rails

I have array of ranges :
[[39600..82800], [39600..70200],[70200..80480]]
I need to determine if there is overlapping or not.What is an easy way to do it in ruby?
In the above case the output should be 'Overlapping'.

This is a very interesting puzzle, especially if you care about performances.
If the ranges are just two, it's a fairly simple algorithm, which is also covered in ActiveSupport overlaps? extension.
def ranges_overlap?(r1, r2)
r1.cover?(r2.first) || r2.cover?(r1.first)
end
If you want to compare multiple ranges, it's a fairly interesting algorithm exercise.
You could loop over all the ranges, but you will need to compare each range with all the other possibilities, but this is an algorithm with exponential cost.
A more efficient solution is to order the ranges and execute a binary search, or to use data structures (such as trees) to make possible to compute the overlapping.
This problem is also explained in the Interval tree page. Computing an overlap essentially consists of finding the intersection of the trees.

Is this not a way to do it?
def any_overlapping_ranges(array_of_ranges)
array_of_ranges.sort_by(&:first).each_cons(2).any?{|x,y|x.last>y.first}
end
p any_overlapping_ranges([50..100, 1..51,200..220]) #=> True

Consider this:
class Range
include Comparable
def <=>(other)
self.begin <=> other.begin
end
def self.overlap?(*ranges)
edges = ranges.sort.flat_map { |range| [range.begin, range.end] }
edges != edges.sort.uniq
end
end
Range.overlap?(2..12, 6..36, 42..96) # => true
Notes:
This could take in any number of ranges.
Have a look at the gist with some tests to play with the code.
The code creates a flat array with the start and end of each range.
This array will retain the order if they don't overlap. (Its easier to visualize with some examples than textually explaining why, try it).

For sake of simplicity and readability I'll suggest this approach:
def overlaps?(ranges)
ranges.each_with_index do |range, index|
(index..ranges.size).each do |i|
nextRange = ranges[i] unless index == i
if nextRange and range.to_a & nextRange.to_a
puts "#{range} overlaps with #{nextRange}"
end
end
end
end
r = [(39600..82800), (39600..70200),(70200..80480)]
overlaps?(r)
and the output:
ruby ranges.rb
39600..82800 overlaps with 39600..70200
39600..82800 overlaps with 70200..80480
39600..70200 overlaps with 70200..80480

Related

Rails batch convert one object to another object

I am looking for the most efficient way (in speed) to converts a huge number of objects (1M instances) to another object type.
Unfortunately I don't have the choice of what I am getting as an input (the million object).
So far I've tried with each_slice but it does not show much improvement when it comes to speed!
It looks like this:
expected_objects_of_type_2 = []
huge_array.each_slice(3000) do |batch|
batch.each do |object_type_1|
expected_objects_of_type_2 << NewType2.new(object_type_1)
end
end
Any idea?
Thanks!
I did a quick test with a few different methods of looping the array and measured the timings:
huge_array = Array.new(10000000){rand(1..1000)}
a = Time.now
string_array = huge_array.map{|x| x.to_s}
b = Time.now
puts b-a
Same with:
sa = []
huge_array.each do |x|
sa << x.to_s
end
and
sa = []
huge_array.each_slice(3000) do |batch|
batch.each do |x|
sa << x.to_s
end
end
No idea what you are converting so I did a bit of simple int to string.
Timings
Map: 1.7
Each: 2.3
Slice: 3.2
So apparently your slice overhead makes things slower. Map seems to be the fastest (which is internally just a for loop but with a non-dynamic length array as output). The << seems to slow things down a bit.
So if each object needs an individual converting you are stuck with O(n) complexity and can't speed things up by a lot. Just avaid overhead.
Depending on your data, sorting and exploiting caching effects might help or avoiding duplicates if you have a lot of identical data but we have no way to know if we don't know your actual conversions.
I would treat each slice in its own thread:
huge_array.each_slice(3000) do |batch|
Thread.new do
batch.each do |object_type_1|
expected_objects_of_type_2 << NewType2.new(object_type_1)
end
end
end
Then you have to wait for the threads to terminate using join. They should be accumulated in an array and joined.

Clean way to loop over a masked list in Julia

In Julia, I have a list of neighbors of a location stored in all_neighbors[loc]. This allows me to quickly loop over these neighbors conveniently with the syntax for neighbor in all_neighbors[loc]. This leads to readable code such as the following:
active_neighbors = 0
for neighbor in all_neighbors[loc]
if cube[neighbor] == ACTIVE
active_neighbors += 1
end
end
Astute readers will see that this is nothing more than a reduction. Because I'm just counting active neighbors, I figured I could do this in a one-liner using the count function. However,
# This does not work
active_neighbors = count(x->x==ACTIVE, cube[all_neighbors[loc]])
does not work because the all_neighbors mask doesn't get interpreted correctly as simply a mask over the cube array. Does anyone know the cleanest way to write this reduction? An alternative solution I came up with is:
active_neighbors = count(x->x==ACTIVE, [cube[all_neighbors[loc][k]] for k = 1:length(all_neighbors[loc])])
but I really don't like this because it's even less readable than what I started with. Thanks for any advice!
This should work:
count(x -> cube[x] == ACTIVE, all_neighbors[loc])

ActiveRecord - How fast are Calculate() methods in PostgreSQL?

I have a rather noobish question about ActiveRecord in ruby on rails.
I'm working on an app on a Postgresql database that will need to handle large amounts of data from multiple platforms as quickly as possible. I'm going through the process of trying to optimize for speed.
I have two functions and I'm wondering which one would be faster theoretically.
Example #1
def spend_branded(date_range)
total_branded_spend = 0.0
platform_list.each do |platform|
platform.where(date: date_range).each do |platform_performance|
total_branded_spend += platform_performance.spend["branded"].to_f
end
end
total_branded_spend
end
VS.
Example #2
def spend_branded(date_range)
total_branded_spend = 0.0
platform_list.each do |platform|
total_branded_spend += (platform.where(date: date_range).sum(:branded_spend)).to_f
end
total_branded_spend
end
As you can see, in the first example, a selection of records are retrieved via the .where() method and then are iterated on with the desired field summed manually. In the second example however, I'm making use of the .sum() method to do the summing at the database level.
I'm wondering if anyone knows which method is faster in general. I suspect the second method is faster, but is it faster by many degrees?
Thanks so much for taking the time to read this question.
EDIT:
As #lacostenycoder pointed out, I should have clarified what platform_list is. It references an array with 1 to 3 ActiveRecord collections containing 1 record per each day in the date_range.
Upon benchmarking with the method provided in his answer, I found the 2nd method to be slightly faster.
user system total real
spend_branded 0.000000 0.000000 0.000000 ( 0.003632)
spend_branded_sum 0.000000 0.000000 0.000000 ( 0.002612)
(102 records processed)
Here's how you can benchmark your methods for comparison. Open a rails console rails c, then paste this into your console.
def spend_branded(date_range)
total_branded_spend = 0.0
platform_list.each do |platform|
platform.where(date: date_range).each do |platform_performance|
total_branded_spend += platform_performance.spend["branded"].to_f
end
end
total_branded_spend
end
def spend_branded_sum(date_range)
total_branded_spend = 0.0
platform_list.each do |platform|
total_branded_spend += (platform.where(date: date_range).sum(:branded_spend)).to_f
end
total_branded_spend
end
require 'benchmark'
Benchmark.bm do |x|
x.report(:spend_branded) { spend_branded(date_range) }
x.report(:spend_branded_sum) { spend_branded_sum(date_range) }
end
Of course we would expect the 2nd way to be faster. We can probably offer more help if you showed more about the model relations and how platform_list is defined.
Also you might want to check out the PgHero gem which can be helpful in identifying slow queries and where to add indices to get better performance. In general when done correctly, doing proper calculations at the database level will be orders of magnitude faster than iteration over large sets of Ruby object.
Also you might try to refactor your first version to this:
def spend_branded(date_range)
platform_list.map do |platform|
platform.where(date: date_range)
.pluck(:spend).map{|h| h['branded'].to_f}.sum
end.sum
end
And 2nd version to
def spend_branded_sum(date_range)
platform_list.map do |platform|
platform.where(date: date_range).sum(:branded_spend).to_f
end.sum
end
lacostenycoder is correct to recommend that you benchmark your code.
If the values you are trying to sum are directly available in the database, Calculations are very likely going to be faster. I do not know how much faster.
If platform_list is a collection of models, something like this might work and might outperform your iteration:
Platform.
where(date: date_range).
where(id: platform_list.map(&:id)).
sum(:branded_spend)

Fuzzy String search: To find one string among any subtring of another

I want to find out one string with some Levenshtein distance inside bigger string. I have written the code for finding the distance between two string but want to efficiently implement when i want to find some substring with fixed Levenshtein distance.
module Levenshtein
def self.distance(a, b)
a, b = a.downcase, b.downcase
costs = Array(0..b.length) # i == 0
(1..a.length).each do |i|
costs[0], nw = i, i - 1 # j == 0; nw is lev(i-1, j)
(1..b.length).each do |j|
costs[j], nw = [costs[j] + 1, costs[j-1] + 1, a[i-1] == b[j-1] ? nw : nw + 1].min, costs[j]
end
end
costs[b.length]
end
def self.test
%w{kitten sitting saturday sunday rosettacode raisethysword}.each_slice(2) do |a, b|
puts "distance(#{a}, #{b}) = #{distance(a, b)}"
end
end
end
Check at the TRE library, which does exactly this (in C), and quite efficienly. Now look carefully at the matching function, which is basically 500 lines of unreadable (but necessary) code.
I'd say that, instead of rolling your own version and provided you don't intend to read all the much difficult papers on the subject (search for "approximate string matching") and don't have a few free months for studying the subject, you'd be much better of writing a small wrapper around the library itself. Your Ruby version would be inefficient anyway in comparison with what can be obtained in C.

Rails: Performance of checking to see if a set contains a value vs adding it multiple times

I have a loop that will iterate tens of thousands of times, and a set that may have only 50 distinct values. Which of the following is more efficient to have as part of the loop?
if !myset.include?('value')
myset.add('value')
or
myset.add('value')
If it is more often that myself already has the values, then the whole execution in the first code would be just the if condition, and the second one which does add anyway would probably be slightly slower.
If it is more often that myself does not have the values, then in the first code, evaluation of the condition is extra and would be slower whereas the second one would be slightly faster.
Either way, I think the difference is so subtle that it can be absorbed within the error.
If we randomize over a set of 50 distinct values:
require 'benchmark'
Benchmark.bm do |b|
b.report do
set = []
100_000.times do
i = rand(50)
set.push(i)
end
end
b.report do
set = []
100_000.times do
i = rand(50)
unless set.include?(i)
set.push(i)
end
end
end
end
the result I get is 0.04 against 0.2 with checking. So its 5 times faster if you don't perform checking in this case.
The larger is the set of randomized values the longer it is going to take (with checking).
You can try to perform similar benchmark with your code to see what tendencies you get. Run it with large numbers and multiple times to get cleaner results.
Update:
require 'set'
require 'benchmark'
Benchmark.bm do |b|
b.report do
set = Set.new
100_000.times do
i = rand(50)
set.add(i)
end
end
b.report do
set = Set.new
100_000.times do
i = rand(50)
unless set.include?(i)
set.add(i)
end
end
end
end
Running with actual Set both examples appear to be slower and quite similar - around 0.48.
If you use Set you don't need write if just myset.add('value'), what about speed set.add is mostly the same as array.push

Resources