How to merge two objects and keep count - ruby-on-rails

I am building a Rails 5.2 app.
In this app I am working with statistics.
I generate two objects:
{
"total_project": {
"website": 1,
"google": 1,
"instagram": 1
}
}
And this:
{
"total_leads": {
"website": 1,
"google": 2,
"client_referral": 1
}
}
I need to merge these two objects into one single objects that increases the count. The desired result is:
{
"total_both": {
"website": 2,
"google": 3,
"instagram": 1,
"client_referral": 1
}
}
I tried this and it technically works, it merges the objects but the count is not updated:
#total_project = array_projects.group_by { |d| d[:entity_type] }.transform_values(&:count).symbolize_keys
#total_leads = array_leads.group_by { |d| d[:entity_type] }.transform_values(&:count).symbolize_keys
#total_sources = merged.merge **#total_project, **#total_leads
Please note that the attributes (sources) are dynamic from the database so I cannot hard code anything. The user can add their own sources.

#total_sources = #total_project.merge(#total_leads) do |key, ts_value, tp_value|
ts_value + tp_value
end
If there can be more than 2 sources, put everything in an array and do.
#total_sources = source_array.reduce do |accumulator, next_source|
accumulator.merge(next_source) { |key, v1, v2| v1 + v2 }
end

You may compute the desired result as follows.
arr = [{ "total_project": { "website": 1, "google": 1, "instagram": 1 } },
{ "total_leads": { "website": 1, "google": 2, "client_referral": 1 } }]
{ "total_both" => arr.flat_map(&:values)
.reduce { |h,g| h.merge(g) { |_,o,n| o+n } } }
#=> {"total_both"=>{:website=>2, :google=>3, :instagram=>1, :client_referral=>1}}
Note that
arr.flat_map(&:values)
#=> [{:website=>1, :google=>1, :instagram=>1},
# {:website=>1, :google=>2, :client_referral=>1}]
Had I used Array#map this would have been
arr.map(&:values)
#=> [[{:website=>1, :google=>1, :instagram=>1}],
# [{:website=>1, :google=>2, :client_referral=>1}]]
See Enumerable#flat_map, Enumerable#reduce and the form of Hash#merge that takes a block (here { |_,o,n| o+n }) which returns the values of keys that are present in both hashes being merged. See the doc for merge for definitions of the three block variables (here _, o and n). I have named the first block variable (holding the common key) _ to signal to the reader that it is not used in the block calculation (a common Ruby convention).

Related

how to make a deep_slice in a hash on ruby

I was looking around for a clean way to do this and I found some workarounds but did not find anything like the slice (some people recommended to use a gem but I think is not needed for this operations, pls correct me if I am wrong), so I found myself with a hash that contains a bunch of hashes and I wanted a way to perform the Slice operation over this hash and get also the key/value pairs from nested hashes, so the question:
Is there something like deep_slice in ruby?
Example:
input: a = {b: 45, c: {d: 55, e: { f: 12}}, g: {z: 90}}, keys = [:b, :f, :z]
expected output: {:b=>45, :f=>12, :z=>90}
Thx in advance! 👍
After looking around for a while I decided to implement this myself, this is how I fix it:
a = {b: 45, c: {d: 55, e: { f: 12}}, g: {z: 90}}
keys = [:b, :f, :z]
def custom_deep_slice(a:, keys:)
result = a.slice(*keys)
a.keys.each do |k|
if a[k].class == Hash
result.merge! custom_deep_slice(a: a[k], keys: keys)
end
end
result
end
c_deep_slice = custom_deep_slice(a: a, keys: keys)
p c_deep_slice
The code above is a classic DFS, which takes advantage of the merge! provided by the hash class.
You can test the code above here
require 'set'
def recurse(h, keys)
h.each_with_object([]) do |(k,v),arr|
if keys.include?(k)
arr << [k,v]
elsif v.is_a?(Hash)
arr.concat(recurse(v,keys))
end
end
end
hash = { b: 45, c: { d: 55, e: { f: 12 } }, g: { b: 21, z: 90 } }
keys = [:b, :f, :z]
arr = recurse(hash, keys.to_set)
#=> [[:b, 45], [:f, 12], [:b, 21], [:z, 90]]
Notice that hash differs slightly from the example hash given in the question. I added a second nested key :b to illustrate the problem of returning a hash rather than an array of key-value pairs. Were we to convert arr to a hash the pair [:b, 45] would be discarded:
arr.to_h
#=> {:b=>21, :f=>12, :z=>90}
If desired, however, one could write:
arr.each_with_object({}) { |(k,v),h| (h[k] ||= []) << v }
#=> {:b=>[45, 21], :f=>[12], :z=>[90]}
I converted keys from an array to a set merely to speed lookups (keys.include?(k)).
A slightly modified approach could be used if the hash contained nested arrays of hashes as well as nested hashes.
My version
maybe it should help
def deep_slice( obj, *args )
deep_arg = {}
slice_args = []
args.each do |arg|
if arg.is_a? Hash
arg.each do |hash|
key, value = hash
if obj[key].is_a? Hash
deep_arg[key] = deep_slice( obj[key], *value )
elsif obj[key].is_a? Array
deep_arg[key] = obj[key].map{ |arr_el| deep_slice( arr_el, *value) }
end
end
elsif arg.is_a? Symbol
slice_args << arg
end
end
obj.slice(*slice_args).merge(deep_arg)
end
Object to slice
obj = {
"id": 135,
"kind": "transfer",
"customer": {
"id": 1,
"name": "Admin",
},
"array": [
{
"id": 123,
"name": "TEST",
"more_deep": {
"prop": "first",
"prop2": "second"
}
},
{
"id": 222,
"name": "2222"
}
]
}
Schema to slice
deep_slice(
obj,
:id,
customer: [
:name
],
array: [
:name,
more_deep: [
:prop2
]
]
)
Result
{
:id=>135,
:customer=>{
:name=>"Admin"
},
:array=>[
{
:name=>"TEST",
:more_deep=>{
:prop2=>"second"
}
},
{
:name=>"2222"
}
]
}

Get differences between Array and ActiveRecord thousands of records

I'm trying to get deltas of additions, deletions, and updates between a csv data with about 500,000 records in comparison to an ActiveRecord.
iden being the identifier for their differences
ex. csv_data
[{
iden: 1, group_num: 111
},
{
iden: 2, group_num: 222
},
{
iden: 3, group_num: 333
},
{
iden: 4, group_num: 444
}]
ex. ActiveRecordData
[{
iden: 2, group_num: 222
},
{
iden: 3, group_num: 333
},
{
iden: 4, group_num: 999
},
{
iden: 5, group_num: 555
}]
As a result I would want to get
an array of additions
[{
iden: 5, group_num: 555
}]
an array of removals
[{
iden: 1, group_num: 111
}]
an array of updates
[{
iden: 4, group_num: 999
}]
I tried iterating through each to get the particular deltas but it's taking hours for large hundred thousand data sets. How would I better optimize this?
additions = []
updates = []
csv_data.each_slice(1000).map do |chunk|
chunk.map { |csv_item|
active_record = ActiveRecordData.where(iden: csv_item[:iden])
if !active_record.exists?
additions << active_record
elsif active_record.first.group_num != csv_item[:group_num]
updates << active_record
end
}
end
deletions = ActiveRecordData.all.select{|active_record| !csv_data.any?{|csv_item| csv_item[:iden] == active_record.iden}}
I'd start by addressing these issues:
Multiple queries for every item
Loading data you don't use
Instantiating ActiveRecord models unnecessarily
csv_data.each_slice(1000).map do |chunk|
records = ActiveRecordData
.where(iden: chunk.map(&:iden))
.pluck(:iden, :group_num)
additions += chunk.reject do |row|
records.find { |record| record.iden == row.iden }
end
updates += chunk.select do |row|
record = records.find { |record| record.iden == row.iden }
record.group_num != row.group_num
end
end
Finally, you likely need to go about your deletions in a different way. If your iden values are numeric and relatively sequential, one low-hanging approach is to fetch just iden values within a range (e.g. where(iden: 1..100_000).pluck(:iden)), then loop through your data to identify and add deleted records to a deletions buffer before moving on to the next batch.
I would create a temporary table with a primary key over ident and the load the CSV data into the table using bulk inserts in chunks. Once there it would be trivial (and very fast) to get the diff of two tables:
SELECT table_a.ident, table_a.group_num FROM table_a WHERE table_a.ident NOT IN (SELECT table_b.ident FROM table_b)
SELECT table_b.ident, table_b.group_num FROM table_b WHERE table_b.ident NOT IN (SELECT table_a.ident FROM table_a)
SELECT table_a.ident, table_a.group_num INNER JOIN table_b ON table_a.ident = table_b.ident AND table_a.group_num <> table_b.group_num

Group by unique values while summing / adding other values

I have a data structure that looks like this:
arr = [
{
price: 2.0,
unit: "meter",
tariff_code: "4901.99",
amount: 200
},
{
price: 2.0,
unit: "meter",
tariff_code: "4901.99",
amount: 200
},
{
price: 14.0,
unit: "yards",
tariff_code: "6006.24",
amount: 500
},
{
price: 14.0,
unit: "yards",
tariff_code: "6006.24",
amount: 500
}
]
I need to group all of these by tariff_code, while summing the price and amounts that correspond with that tariff code. So my expected output should be:
[
{
price: 4.0,
unit: "meter",
tariff_code: "4901.99",
amount: 400
},
{
price: 2.0,
unit: "yards",
tariff_code: "6006.24",
amount: 1000
}
]
receipt_data[:order_items].group_by { |oi| oi[:tariff_code] }.values
The group_by statement used above will allow me to group by tariff_code but I'm unable to work out a way to sum the other values. I'm sure there is a slick one-liner way to accomplish this...
More verbose:
grouped_items = arr.group_by { |oi| oi[:tariff_code] }
result = grouped_items.map do |tariff_code, code_items|
price, amount = code_items.reduce([0, 0]) do |(price, amount), ci|
[price + ci[:price], amount + ci[:amount]]
end
{
price: price,
unit: code_items.first[:unit],
tariff_code: tariff_code,
amount: amount
}
end
#[
# {:price=>4.0, :unit=>"meter", :tariff_code=>"4901.99", :amount=>400}
# {:price=>28.0, :unit=>"yards", :tariff_code=>"6006.24", :amount=>1000}
#]
Just to add to the fun, the answer which uses group_by as #cary said, and mostly copying Pavel's answer. This is very bad performancewise and use only if the array is small . Also it uses sum which is available only in Rails. (can be replaced by .map { |item| item[:price] }.reduce(:+) in pure ruby)
arr.group_by { |a| a[:tariff_code] }.map do |tariff_code, items|
{
price: items.sum { |item| item[:price] },
unit: items.first[:unit],
tariff_code: tariff_code,
amount: items.sum { |item| item[:amount] }
}
end
This would have been even smaller if it was an array of objects (ActiveRecord objects maybe) with methods instead of hashes.
arr.group_by(&:tariff_code).map do |tariff_code, items|
{
price: items.sum(&:price]),
unit: items.first[:unit],
tariff_code: tariff_code,
amount: items.sum(&:amount)
}
end
There are two standard ways of addressing problems of this kind. One, which I've taken, is to use the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are present in both hashes being merged. The other way is to use Enumerable#group_by, which I expect someone will soon employ in another answer. I do not believe either approach is preferable in terms of efficiency or readability.
arr.each_with_object({}) do |g,h|
h.update(g[:tariff_code]=>g) do |_,o,n|
{ price: o[:price]+n[:price], unit: o[:unit], amount: o[:amount]+n[:amount] }
end
end.values
#=> [{:price=>4.0, :unit=>"meter", :amount=>400},
# {:price=>28.0, :unit=>"yards", :amount=>1000}]
Note that the receiver of values is seen to be:
{"4901.99"=>{:price=>4.0, :unit=>"meter", :amount=>400},
{"6006.24"=>{:price=>28.0, :unit=>"yards", :amount=>1000}}
A simple approach, but its easy to add new keys for summing and to change a group key. Not sure about efficiency, but 500_000 times Benchmark of arr.map here looks good
#<Benchmark::Tms:0x00007fad0911b418 #label="", #real=1.480799000000843, #cstime=0.0, #cutime=0.0, #stime=0.0017340000000000133, #utime=1.4783359999999999, #total=1.48007>
summ_keys = %i[price amount]
grouping_key = :tariff_code
result = Hash.new { |h, k| h[k] = {} }
arr.map do |h|
cumulative = result[h[grouping_key]]
h.each do |k, v|
case k
when *summ_keys
cumulative[k] = (cumulative[k] || 0) + h[k]
else
cumulative[k] = v
end
end
end
p result.values
# [{:price=>4.0, :unit=>"meter", :tariff_code=>"4901.99", :amount=>400},
# {:price=>28.0, :unit=>"yards", :tariff_code=>"6006.24", :amount=>1000}]

Is it possible to select certain elements without iterating through the array?

I have an array of objects, each of which has the property :cow either set to false or true:
animals = [
{
id: 1,
cow: true
},
{
id: 2,
cow: true
},
{
id: 3,
cow: true
},
{
id: 4,
cow: false
},
{
id: 5,
cow: false
}
]
I need to select all members of the array that pass a condition without iterating through every element of the array.
Is it possible?
I tried:
notCows = animals.reject { |a| !a[:cow] }
notCows = animals[0, 1, 2]
which doesn't work.
No, this is impossible. In order to find all elements that satisfy a certain condition, you need to look at all elements to see whether they satisfy that condition. It is simply logically not possible to find all elements of a collection without iterating through all elements of the collection.
You were almost there, use Enumerable#select (which scans the all the member of the collection, by the way):
animals.select { |animal| animal[:cow] }
#=> [{:id=>1, :cow=>true}, {:id=>2, :cow=>true}, {:id=>3, :cow=>true}]
Or the opposite:
animals.select { |animal| !animal[:cow] }
#=> [{:id=>4, :cow=>false}, {:id=>5, :cow=>false}]
The returned results are still Ruby objects: Arrays of Hashes.
As alternative you can group by status (Enumerable#group_by):
animals.group_by { |a| a[:cow] }
#=> {true=>[{:id=>1, :cow=>true}, {:id=>2, :cow=>true}, {:id=>3, :cow=>true}], false=>[{:id=>4, :cow=>false}, {:id=>5, :cow=>false}]}

Merging dynamically generated attributes into a new entry and summing their values

I'm looking for some advice on how to properly merge some key/value pairs into a separate database entry and summing their values.
I have a Task which has a Vendor_Upload which has many Vendor_Shipping_Logs which has many Vendor_Shipping_Log_Products. I'm not sure if the deep nesting makes a difference, but the important values to look at here are the Item_ID and Quantity.
This is currently how the parameters are spit out:
Parameters: {
"task"=>{
"task_type"=>"Vendor Upload",
"vendor_upload_attributes"=>{
"upload_type"=>"Warranty Orders",
"vendor_shipping_logs_attributes"=>{
"1490674883303"=>{
"guest_name"=>"Martin Crane",
"order_number"=>"33101",
"vendor_shipping_log_products_attributes"=>{
"1490675774108"=>{
"item_id"=>"211",
"quantity"=>"3"
},
"1490675775147"=>{
"item_id"=>"213",
"quantity"=>"6"
}
}
},
"1490674884454"=>{
"guest_name"=>"Frasier Crane",
"order_number"=>"33102",
"vendor_shipping_log_products_attributes"=>{
"1490675808026"=>{
"item_id"=>"214",
"quantity"=>"10"
},
"1490675808744"=>{
"item_id"=>"213",
"quantity"=>"1"
}
}
},
"1490674885293"=>{
"guest_name"=>"Niles Crane",
"order_number"=>"33103",
"vendor_shipping_log_products_attributes"=>{
"1490675837184"=>{
"item_id"=>"211",
"quantity"=>"3"
}
}
},
"1490674886373"=>{
"guest_name"=>"Daphne Moon",
"order_number"=>"33104",
"vendor_shipping_log_products_attributes"=>{
"1490675852950"=>{
"item_id"=>"213",
"quantity"=>"8"
},
"1490675853845"=>{
"item_id"=>"214",
"quantity"=>"11"
}
}
}
}
}
}
}
Upon submission I want to merge each unique Vendor_Shipping_Log_Products Item_IDs and sum their quantities into a new Stockmovement_Batch as a nested Stockmovement to keep my inventories up to date.
See example patameters here of what I would like the output to look like:
Parameters: {
"stockmovement_batch"=>{
"stockmovement_type"=>"Ecomm Order",
"stockmovements_attributes"=>{
"1490676054881"=>{
"item_id"=>"211",
"adjust_quantity"=>"-6"
},
"1490676055897"=>{
"item_id"=>"213",
"adjust_quantity"=>"-15"
},
"1490676057616"=>{
"item_id"=>"214",
"adjust_quantity"=>"-21"
}
}
}
}
Is this something I can do all in one simple go, or do I have to stick with doing each process in a separate form?
First you need to separate out the values you want to iterate through:
data = params.require("task")
.require("vendor_upload_attributes")
.require("vendor_shipping_logs_attributes")
Then pull the vendor_shipping_log_products_attributes and flatten it to an array of hashes:
logs = data.values.map do |h|
h["vendor_shipping_log_products_attributes"].values
end.flatten
# => [{"item_id"=>"211", "quantity"=>"3"}, {"item_id"=>"213", "quantity"=>"6"}, {"item_id"=>"214", "quantity"=>"10"}, {"item_id"=>"213", "quantity"=>"1"}, {"item_id"=>"211", "quantity"=>"3"}, {"item_id"=>"213", "quantity"=>"8"}, {"item_id"=>"214", "quantity"=>"11"}]
Then we merge the data by creating a intermediary hash where we use the item_id as keys.
stockmovements = logs.each_with_object({}) do |hash, memo|
id = hash["item_id"]
memo[id] ||= []
memo[id].push(hash["quantity"].to_i)
end
# => {"211"=>[3, 3], "213"=>[6, 1, 8], "214"=>[10, 11]}
We then can then map the result and sum the values:
stockmovements.map do |(k,v)|
{
item_id: k,
adjust_quantity: 0 - v.sum
}
end
# => [{:item_id=>"211", :adjust_quantity=>-6}, {:item_id=>"213", :adjust_quantity=>-15}, {:item_id=>"214", :adjust_quantity=>-21}]

Resources