ruby compare two arrays of hash, with certain keys [closed] - ruby-on-rails

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 7 years ago.
Improve this question
There's two arrays of hash and I want remove the 'common' elements from the two arrays, based on certain keys. For example:
array1 = [{a: '1', b:'2', c:'3'}, {a: '4', b: '5', c:'6'}]
array2 = [{a: '1', b:'2', c:'10'}, {a: '3', b: '5', c:'6'}]
and the criteria keys are a and b. So when I get the result of something like
array1-array2 (don't have to overwrite '-' if there's better approach)
it will expect to get
[{a: '4', b: '5', c:'6'}]
sine we were using a and b as the comparing criteria. It will wipe the second element out since the value for a is different for array1.last and array2.last.

As I understand, you are given two arrays of hashes and a set of keys. You want to reject all elements (hashes) of the first array whose values match the values of any element (hash) of the second array, for all specified keys. You can do that as follows.
Code
require 'set'
def reject_partial_dups(array1, array2, keys)
set2 = array2.each_with_object(Set.new) do |h,s|
s << h.values_at(*keys) if (keys-h.keys).empty?
end
array1.reject do |h|
(keys-h.keys).empty? && set2.include?(h.values_at(*keys))
end
end
The line:
(keys-h.keys).empty? && set2.include?(h.values_at(*keys))
can be simplified to:
set2.include?(h.values_at(*keys))
if none of the values of keys in the elements (hashes) of array1 are nil. I created a set (rather than an array) from array2 in order to speed the lookup of h.values_at(*keys) in that line.
Example
keys = [:a, :b]
array1 = [{a: '1', b:'2', c:'3'}, {a: '4', b: '5', c:'6'}, {a: 1, c: 4}]
array2 = [{a: '1', b:'2', c:'10'}, {a: '3', b: '5', c:'6'}]
reject_partial_dups(array1, array2, keys)
#=> [{:a=>"4", :b=>"5", :c=>"6"}, {:a=>1, :c=>4}]
Explanation
First create set2
e0 = array2.each_with_object(Set.new)
#=> #<Enumerator: [{:a=>"1", :b=>"2", :c=>"10"}, {:a=>"3", :b=>"5", :c=>"6"}]
# #:each_with_object(#<Set: {}>)>
Pass the first element of e0 and perform the block calculation.
h,s = e0.next
#=> [{:a=>"1", :b=>"2", :c=>"10"}, #<Set: {}>]
h #=> {:a=>"1", :b=>"2", :c=>"10"}
s #=> #<Set: {}>
(keys-h.keys).empty?
#=> ([:a,:b]-[:a,:b,:c]).empty? => [].empty? => true
so compute:
s << h.values_at(*keys)
#=> s << {:a=>"1", :b=>"2", :c=>"10"}.values_at(*[:a,:b] }
#=> s << ["1","2"] => #<Set: {["1", "2"]}>
Pass the second (last) element of e0 to the block:
h,s = e0.next
#=> [{:a=>"3", :b=>"5", :c=>"6"}, #<Set: {["1", "2"]}>]
(keys-h.keys).empty?
#=> true
so compute:
s << h.values_at(*keys)
#=> #<Set: {["1", "2"], ["3", "5"]}>
set2
#=> #<Set: {["1", "2"], ["3", "5"]}>
Reject elements from array1
We now iterate through array1, rejecting elements for which the block evaluates to true.
e1 = array1.reject
#=> #<Enumerator: [{:a=>"1", :b=>"2", :c=>"3"},
# {:a=>"4", :b=>"5", :c=>"6"}, {:a=>1, :c=>4}]:reject>
The first element of e1 is passed to the block:
h = e1.next
#=> {:a=>"1", :b=>"2", :c=>"3"}
a = (keys-h.keys).empty?
#=> ([:a,:b]-[:a,:b,:c]).empty? => true
b = set2.include?(h.values_at(*keys))
#=> set2.include?(["1","2"] => true
a && b
#=> true
so the first element of e1 is rejected. Next:
h = e1.next
#=> {:a=>"4", :b=>"5", :c=>"6"}
a = (keys-h.keys).empty?
#=> true
b = set2.include?(h.values_at(*keys))
#=> set2.include?(["4","5"] => false
a && b
#=> false
so the second element of e1 is not rejected. Lastly:
h = e1.next
#=> {:a=>1, :c=>4}
a = (keys-h.keys).empty?
#=> ([:a,:c]-[:a,:b]).empty? => [:c].empty? => false
so return true (meaning the last element of e1 is not rejected), as there is no need to compute:
b = set2.include?(h.values_at(*keys))

So you really should try this out yourself because I am basically solving it for you.
The general approach would be:
For every time in array1
Check to see the same value in array2 has any keys and values with the same value
If they do then, delete it
You would probably end up with something like array1.each_with_index { |h, i| h.delete_if {|k,v| array2[i].has_key?(k) && array2[i][k] == v } }

Related

ruby arrays count of most frequent element [duplicate]

This question already has answers here:
How to find an item in array which has the most occurrences [duplicate]
(11 answers)
Closed 6 years ago.
I'm trying to figure out how to find a count of the most frequent element in an array of integers. I can think of a few methods that might be helpful but when I get to writing an expression inside the block I get complete lost on how to compare an element with the next and previous element. Any ideas? All help is really really appreciated!!!
An easy was is to determine all the unique values, convert each to its count in the array, then determine the largest count.
def max_count(arr)
arr.uniq.map { |n| arr.count(n) }.max
end
For example:
arr = [1,2,4,3,2,6,3,4,2]
max_count(arr)
#=> 3
There are three steps:
a = arr.uniq
#=> [1, 2, 4, 3, 6]
b = a.map { |n| arr.count(n) }
#=> [1, 3, 2, 2, 1]
b.max
#=> 3
A somewhat more efficient way (because the elements of arr are enumerated only once) is to use a counting hash:
def max_count(arr)
arr.each_with_object(Hash.new(0)) { |n,h| h[n] += 1 }.values.max
end
max_count(arr)
#=> 3
We have:
a = arr.each_with_object(Hash.new(0)) { |n,h| h[n] += 1 }
#=> {1=>1, 2=>3, 4=>2, 3=>2, 6=>1}
b = a.values
#=> [1, 3, 2, 2, 1]
b.max
#=> 3
See Hash::new for an explanation of Hash.new(0). Briefly, if h = Hash.new(0) and h does not have a key k, h[k] will return the default value, which here is zero. h[k] += 1 expands to h[k] = h[k] + 1, so if h does not have a key k, this becomes h[k] = 0 + 1. On the other hand, if, say, h[k] => 2, then h[k] = h[k] + 1 #=> h[k] = 3 + 1.

Separate array of hashes by nil

I have a Hash value lik this:
hs = {2012 => [7,nil,3], 2013 => [2, 6, nil, 8], 2014 => [9, 1, 2, 8]}
The keys are years. I want to collect values backwards until nil appears like this:
some_separate_method(hs)
{2013 => [8], 2014 => [9, 1, 2, 8]}
I thought this is not difficult to implement by using reverse_each, but I couldn't. How can I make a method like this?
Edit
With AmitA's code I could make it.
new_hs = []
hs.reverse_each{|k,v| new_hs << [k,v]; break if v.include?(nil)}
new_hs = Hash[new_hs.sort]
How about this:
res = {}
hs.to_a.reverse.each do |k, arr|
res[k] = arr.split(nil).last
break unless res[k].length == arr.length
end
A pure-Ruby solution:
hs.reverse_each.with_object({}) do |(k,v),h|
h[k] = v.dup
ndx = v.rindex(&:nil?)
if ndx
h[k] = h[k][ndx+1..-1]
break h
end
end
#=> {2014=>[9, 1, 2, 8], 2013=>[8]}
v.dup is to avoid mutatinghs.
This looks a bit long but I think it should help anyone who's new to ruby and has a similar problem.
def some_separate_method(hs)
new_hash = {}
hs.each do | key, value |
new_arr = []
value.reverse.each do | item |
break if item.nil?
new_arr.unshift(item)
end
new_hash[key] = new_arr
end
new_hash
end
#=> {2012=>[3], 2013=>[8], 2014=>[9, 1, 2, 8]}

Why is this returning nil?

I've been playing around with arrays, and I don't understand where Nil is coming from and why the a[4] isn't being overwritten. Please see my example below.
a = Array.new
a[5] = '5';
a[0, 3] = 'a', 'b', 'c', 'd';
a[4] = 'hello'
template = ERB.new "<%= a %>"
puts template.result(binding)
Returns me the result
["a", "b", "c", "d", "hello", nil, "5"]
and
a = Array.new
a[4] = '5';
a[0, 3] = 'a', 'b', 'c', 'd';
a[4] = 'hello'
template = ERB.new "<%= a %>"
puts template.result(binding)
Returns me the result
["a", "b", "c", "d", "hello", "5"]
Thanks in advance for the help!
In Ruby when you say:
a[0, 3] = a, b, c
it means that starting from the index 0 start inserting 3 objects into the array. but if you say
a[0, 3] = a, b, c, d
since they are more than three elements, another element is inserted after the third object, therefore shifting 5 and its preceding nil to the next positions.
actually it's:
a[start, length]
not
a[start, end]
The problem (as Pedram explained while I typed this) is that a[0, 3] = 'a', 'b', 'c', 'd' does not replace values from a[0] to a[3]. It replaces 3 values beginning at a[0] (a[0], a[1], and a[2]), but since you are giving it 4 values, it inserts the fourth one as a new element after a[2].
If you want to use a range of indexes, use a[0..3]. Or you can use a[0, 4].
This is simple.
You are defining an Array which is empty at first. In Ruby, if you are inserting a value into an out-of-bound array it will fill all the other past indexes with nil.
For example, if I do:
a = []
a[9] = '10'
puts a
The output will be:
[nil, nil, nil, nil, nil, nil, nil, nil, nil, "10"]
All nine indexes before the one I inserted will be filled with nil. Which is the representation of nothing.

Creating a range from one column

I have a column called "Marks" which contains values like
Marks = [100,200,150,157,....]
I need to assign Grades to those marks using the following key
<25=0, <75=1, <125=2, <250=3, <500=4, >500=5
If Marks < 25, then Grade = 0, if marks < 75 then grade = 1.
I can sort the results and find the first record that matches using Ruby's find function. Is it the best method ? Or is there a way by which I can prepare a range using the key by adding Lower Limit and Upper Limit columns to the table and by populating those ranges using the key? Marks can have decimals too Ex: 99.99
Without using Rails, you could do it like this:
marks = [100, 200, 150, 157, 692, 12]
marks_to_grade = { 25=>0, 75=>1, 125=>2, 250=>3, 500=>4, Float::INFINITY=>5 }
Hash[marks.map { |m| [m, marks_to_grade.find { |k,_| m <= k }.last] }]
#=> {100=>2, 200=>3, 150=>3, 157=>3, 692=>5, 12=>0}
With Ruby 2.1, you could write this:
marks.map { |m| [m, marks_to_grade.find { |k,_| m <= k }.last] }.to_h
Here's what's happening:
Enumerable#map (a.k.a collect) converts each mark m to an array [m, g], where g is the grade computed for that mark. For example, when map passes the first element of marks into its block, we have:
m = 100
a = marks_to_grade.find { |k,_| m <= k }
#=> marks_to_grade.find { |k,_| 100 <= k }
#=> [125, 2]
a.last
#=> 2
so the mark 100 is mapped to [100, 2]. (I've replaced the block variable for the value of the key-value pair with the placeholder _ to draw attention to the fact that the value is not being used in the calculation within the block. One could also use, say, _v as the placeholder.) The remaining marks are similarly mapped, resulting in:
b = marks.map { |m| [m, marks_to_grade.find { |k,_| m <= k }.last] }
#=> [[100, 2], [200, 3], [150, 3], [157, 3], [692, 5], [12, 0]]
Lastly
Hash[b]
#=> {100=>2, 200=>3, 150=>3, 157=>3, 692=>5, 12=>0}
or, for Ruby 2.1+
b.to_h
#=> {100=>2, 200=>3, 150=>3, 157=>3, 692=>5, 12=>0}
You can make use of update_all:
Student.where(:mark => 0...25).update_all(grade: 0)
Student.where(:mark => 25...75).update_all(grade: 1)
Student.where(:mark => 75...125).update_all(grade: 2)
Student.where(:mark => 125...250).update_all(grade: 3)
Student.where(:mark => 250...500).update_all(grade: 4)
Student.where("mark > ?", 500).update_all(grade: 5)

How to determine if one array contains all elements of another array

Given:
a1 = [5, 1, 6, 14, 2, 8]
I would like to determine if it contains all elements of:
a2 = [2, 6, 15]
In this case the result is false.
Are there any built-in Ruby/Rails methods to identify such array inclusion?
One way to implement this is:
a2.index{ |x| !a1.include?(x) }.nil?
Is there a better, more readable, way?
a = [5, 1, 6, 14, 2, 8]
b = [2, 6, 15]
a - b
# => [5, 1, 14, 8]
b - a
# => [15]
(b - a).empty?
# => false
Perhaps this is easier to read:
a2.all? { |e| a1.include?(e) }
You can also use array intersection:
(a1 & a2).size == a1.size
Note that size is used here just for speed, you can also do (slower):
(a1 & a2) == a1
But I guess the first is more readable. These 3 are plain ruby (not rails).
This can be achieved by doing
(a2 & a1) == a2
This creates the intersection of both arrays, returning all elements from a2 which are also in a1. If the result is the same as a2, you can be sure you have all elements included in a1.
This approach only works if all elements in a2 are different from each other in the first place. If there are doubles, this approach fails. The one from Tempos still works then, so I wholeheartedly recommend his approach (also it's probably faster).
If there are are no duplicate elements or you don't care about them, then you can use the Set class:
a1 = Set.new [5, 1, 6, 14, 2, 8]
a2 = Set.new [2, 6, 15]
a1.subset?(a2)
=> false
Behind the scenes this uses
all? { |o| set.include?(o) }
You can monkey-patch the Array class:
class Array
def contains_all?(ary)
ary.uniq.all? { |x| count(x) >= ary.count(x) }
end
end
test
irb(main):131:0> %w[a b c c].contains_all? %w[a b c]
=> true
irb(main):132:0> %w[a b c c].contains_all? %w[a b c c]
=> true
irb(main):133:0> %w[a b c c].contains_all? %w[a b c c c]
=> false
irb(main):134:0> %w[a b c c].contains_all? %w[a]
=> true
irb(main):135:0> %w[a b c c].contains_all? %w[x]
=> false
irb(main):136:0> %w[a b c c].contains_all? %w[]
=> true
irb(main):137:0> %w[a b c d].contains_all? %w[d c h]
=> false
irb(main):138:0> %w[a b c d].contains_all? %w[d b c]
=> true
Of course the method can be written as a standard-alone method, eg
def contains_all?(a,b)
b.uniq.all? { |x| a.count(x) >= b.count(x) }
end
and you can invoke it like
contains_all?(%w[a b c c], %w[c c c])
Indeed, after profiling, the following version is much faster, and the code is shorter.
def contains_all?(a,b)
b.all? { |x| a.count(x) >= b.count(x) }
end
Most answers based on (a1 - a2) or (a1 & a2) would not work if there are duplicate elements in either array. I arrived here looking for a way to see if all letters of a word (split to an array) were part of a set of letters (for scrabble for example). None of these answers worked, but this one does:
def contains_all?(a1, a2)
try = a1.chars.all? do |letter|
a1.count(letter) <= a2.count(letter)
end
return try
end
Depending on how big your arrays are you might consider an efficient algorithm O(n log n)
def equal_a(a1, a2)
a1sorted = a1.sort
a2sorted = a2.sort
return false if a1.length != a2.length
0.upto(a1.length - 1) do
|i| return false if a1sorted[i] != a2sorted[i]
end
end
Sorting costs O(n log n) and checking each pair costs O(n) thus this algorithm is O(n log n). The other algorithms cannot be faster (asymptotically) using unsorted arrays.
I was directed to this post when trying to find whether one array ["a", "b", "c"] contained another array ["a", "b"], where in my case identical ordering was an additional requirement to the question.
Here is my solution (I believe it's O(n) complexity), to anyone who has that extra requirement:
def array_includes_array(array_to_inspect, array_to_search_for)
inspectLength = array_to_inspect.length
searchLength = array_to_search_for.length
if searchLength == 0 then
return true
end
if searchLength > inspectLength then
return false
end
buffer = []
for i in 0..inspectLength
buffer.push(array_to_inspect[i])
bufferLastIndex = buffer.length - 1
if(buffer[bufferLastIndex] != array_to_search_for[bufferLastIndex]) then
buffer.clear
next
end
if(buffer.length == searchLength) then
return true
end
end
return false
end
This produces the test results:
puts "1: #{array_includes_array(["a", "b", "c"], ["b", "c"])}" # true
puts "2: #{array_includes_array(["a", "b", "c"], ["a", "b"])}" # true
puts "3: #{array_includes_array(["a", "b", "c"], ["b", "b"])}" # false
puts "4: #{array_includes_array(["a", "b", "c"], ["c", "b", "a"])}" # false
puts "5: #{array_includes_array(["a", "b", "c"], [])}" # true
puts "6: #{array_includes_array([], ["a"])}" # false
puts "7: #{array_includes_array([], [])}" # true

Resources