Find and update a collection in JSONB

Find and update a collection in JSONB - ruby-on-rails

I have a Rails 5.0 app with a JSONB column called data, which contains an array of hashes:
[
{'event': 'web_session', 'user_id': 1, 'count': 13},
{'event': 'web_session', 'user_id': 2, 'count': 10},
{'event': 'web_session', 'user_id': 3, 'count': 42}
]
How would I update one of the hashes, e.g. matching 'user_id': 2, with a different count value?
Is this the most efficient way (I'd potentially have ~1 million hashes):
h = data.find {|h| h['user_id'] == 2}
h['count'] = 43
save

Related

How to filter a number of records and get only the outer most records from a postgres ltree structure?

I have database records arranged in an ltree structure (Postgres ltree extension).
I want to filter these items down to the outer most ancestors of the current selection.
Test cases:
[11, 111, 1111, 2, 22, 222, 2221, 2222] => [11, 2];
[1, 11, 111, 1111, 1112, 2, 22, 222, 2221, 2222, 3, 4, 5] => [1, 2, 3, 4, 5];
[1111, 1112, 2221, 2222] => [1111, 1112, 2221, 2222];
1
|_1.1
| |_1.1.1
| |_1.1.1.1
| |_1.1.1.2
2
|_2.2
| |_2.2.2
| |_2.2.2.1
| |_2.2.2.2
3
|
4
|
5
I have implemented this in Ruby like so.
def fetch_outer_most_items(identifiers)
ordered_items = Item.where(id: identifiers).order("path DESC")
items_array = ordered_items.to_a
outer_most_item_ids = []
while(items_array.size > 0) do
item = items_array.pop
outer_most_item_ids.push(item.id)
duplicate_ids = ordered_items.where("items.path <# '#{item.path}'").pluck(:id)
if duplicate_ids.any?
items_array = items_array.select { |i| !duplicate_ids.include?(i.id) }
end
end
return ordered_items.where(id: outer_most_item_ids)
end
I have eliminated descendants as duplicates via recursion. I'm pretty sure there is an SQL way of doing this which will be the preferred solution as this one triggers n+1 queries. Ideally I would add this function as a named scope for the Item model.
Any pointers please?

Standardizing two Hashes to include same keys

I am looking to graph multiple series in a highcharts graph. I have the following two variables
first = {5 => [dates in here], 6 => [dates in here], etc}
second = {4 => [dates in here], 5 => [dates in here], etc}
The keys are the number associated with the months (4, April, 5, May, etc.)
The problem I am running into is that The two hashes may not always have the same corresponding months. So when I am graphing the data the first[5] is graphing next to second[4], and first[6] is graphing next to second[5] etc.
How can I standardize the two variables so that they always contain the same keys even if it would look something like this:
first = {4 => [no data], 5 => [dates in here], 6 => [dates in here], etc}
second = {4 => [dates in here], 5 => [dates in here], 6 => [no data], etc}

first_keys = first.keys
second_keys = second.keys
keys = first_keys + second_keys
keys.uniq.each do |key|
first[key] = [] if first[key].nil?
second[key] = [] if second[key].nil?
end

There's a lot of ways to do this as it's rather open-ended, but you could create a common keys array from the two hashes and iterate over that assigning nil to the hash at that key. I would do it specifically like this
keys = (first.keys + second.keys).uniq
keys.each do |key|
first[key] ||= nil
second[key] ||= nil
end

As you are graphing the data, presumably with months on the x-axis and days on the y-axis, I think it would be most convenient to iterate over a range of months that provides a minimal cover of the keys (months) contained in both hashes.
data = [{ 5=>[3, 5, 11, 24], 6=>[1, 7, 13, 30], 2=>[4, 13, 18, 29] },
{ 4=>[6, 9, 19, 26], 5=>[4, 8, 11, 22], 1=>[1, 19, 22, 24] }]
month_range = Range.new(*data.reduce([]) { |arr, h| arr | h.keys }.minmax)
#=> 1..6
Note
a = data.reduce([]) { |arr, h| arr | h.keys }
#=> [5, 6, 2, 4, 1]
b = a.minmax
#=> [1, 6]
Range.new(*b)
#=> Range.new(*[1, 6]) => Range.new(1, 6) => 1..6
I've represented the input as an arbitrary array of hashes, should there be more than two.
For graphing, simply iterate over month_range, then for each month m iterate over the elements h of data (hashes) for which h has a key m, and plot the days in h[m], if any (not "no data").
month_range.each do |m|
data.each do |h|
<plot h[m]> if h.has_key?(m) && h[m].any?
end
end

Sum and Average Array of Array

I am trying to sum array of array and get average at the same time. The original data is in the form of JSON. I have to parse my data to array of array in order to render the graph. The graph does not accept array of hash.
I first convert the output to JSON using the definition below.
ActiveSupport::JSON.decode(#output.first(10).to_json)
And the result of the above action is shown below.
output =
[{"name"=>"aaa", "job"=>"a", "pay"=> 2, ... },
{"name"=>"zzz", "job"=>"a", "pay"=> 4, ... },
{"name"=>"xxx", "job"=>"a", "pay"=> 6, ... },
{"name"=>"yyy", "job"=>"a", "pay"=> 8, ... },
{"name"=>"aaa", "job"=>"b", "pay"=> 2, ... },
{"name"=>"zzz", "job"=>"b", "pay"=> 4, ... },
{"name"=>"xxx", "job"=>"b", "pay"=> 6, ... },
{"name"=>"yyy", "job"=>"b", "pay"=> 10, ... },
]
Then I retrieved the job and pay by converting to array of array.
ActiveSupport::JSON.decode(output.to_json).each { |h|
a << [h['job'], h['pay']]
}
The result of the above operation is as below.
a = [["a", 2], ["a", 4], ["a", 6], ["a", 8],
["b", 2], ["b", 4], ["b", 6], ["b", 10]]
The code below will give me the sum of each element in the form of array of array.
a.inject({}) { |h,(job, data)| h[job] ||= 0; h[job] += data; h }.to_a
And the result is as below
[["a", 20], ["b", 22]]
However, I am trying to get the average of the array. The expected output is as below.
[["a", 5], ["b", 5.5]]
I can count how many elements in an array and divide the sum array by the count array. I was wondering if there is an easier and more efficient way to get the average.

output = [
{"name"=>"aaa", "job"=>"a", "pay"=> 2 },
{"name"=>"zzz", "job"=>"a", "pay"=> 4 },
{"name"=>"xxx", "job"=>"a", "pay"=> 6 },
{"name"=>"yyy", "job"=>"a", "pay"=> 8 },
{"name"=>"aaa", "job"=>"b", "pay"=> 2 },
{"name"=>"zzz", "job"=>"b", "pay"=> 4 },
{"name"=>"xxx", "job"=>"b", "pay"=> 6 },
{"name"=>"yyy", "job"=>"b", "pay"=> 10 },
]
output.group_by { |obj| obj['job'] }.map do |key, list|
[key, list.map { |obj| obj['pay'] }.reduce(:+) / list.size.to_f]
end
The group_by method will transform your list into a hash with the following structure:
{"a"=>[{"name"=>"aaa", "job"=>"a", "pay"=>2}, ...], "b"=>[{"name"=>"aaa", "job"=>"b", ...]}
After that, for each pair of that hash, we want to calculate the mean of its 'pay' values, and return a pair [key, mean]. We use a map for that, returning a pair with:
They key itself ("a" or "b").
The mean of the values. Note that the values list has the form of a list of hashes. To retrieve the values, we need to extract the last element of each pair; that's what list.map { |obj| obj['pay'] } is used for. Finally, calculate the mean by suming all elements with .reduce(:+) and dividing them by the list size as a float.
Not the most efficient solution, but it's practical.
Comparing the answer with #EricDuminil's, here's a benchmark with a list of size 8.000.000:
def Wikiti(output)
output.group_by { |obj| obj['job'] }.map do |key, list|
[key, list.map { |obj| obj['pay'] }.reduce(:+) / list.size.to_f]
end
end
def EricDuminil(output)
count_and_sum = output.each_with_object(Hash.new([0, 0])) do |hash, mem|
job = hash['job']
count, sum = mem[job]
mem[job] = count + 1, sum + hash['pay']
end
result = count_and_sum.map do |job, (count, sum)|
[job, sum / count.to_f]
end
end
require 'benchmark'
Benchmark.bm do |x|
x.report('Wikiti') { Wikiti(output) }
x.report('EricDuminil') { EricDuminil(output) }
end
user system total real
Wikiti 4.100000 0.020000 4.120000 ( 4.130373)
EricDuminil 4.250000 0.000000 4.250000 ( 4.272685)

This method should be reasonably efficient. It creates a temporary hash with job name as key and [count, sum] as value:
output = [{ 'name' => 'aaa', 'job' => 'a', 'pay' => 2 },
{ 'name' => 'zzz', 'job' => 'a', 'pay' => 4 },
{ 'name' => 'xxx', 'job' => 'a', 'pay' => 6 },
{ 'name' => 'yyy', 'job' => 'a', 'pay' => 8 },
{ 'name' => 'aaa', 'job' => 'b', 'pay' => 2 },
{ 'name' => 'zzz', 'job' => 'b', 'pay' => 4 },
{ 'name' => 'xxx', 'job' => 'b', 'pay' => 6 },
{ 'name' => 'yyy', 'job' => 'b', 'pay' => 10 }]
count_and_sum = output.each_with_object(Hash.new([0, 0])) do |hash, mem|
job = hash['job']
count, sum = mem[job]
mem[job] = count + 1, sum + hash['pay']
end
#=> {"a"=>[4, 20], "b"=>[4, 22]}
result = count_and_sum.map do |job, (count, sum)|
[job, sum / count.to_f]
end
#=> [["a", 5.0], ["b", 5.5]]
It requires 2 passes, but the created objects aren't big. In comparison, calling group_by on a huge array of hashes isn't very efficient.

How about this (Single pass iterative average calculation)
accumulator = Hash.new {|h,k| h[k] = Hash.new(0)}
a.each_with_object(accumulator) do |(k,v),obj|
obj[k][:count] += 1
obj[k][:sum] += v
obj[k][:average] = (obj[k][:sum] / obj[k][:count].to_f)
end
#=> {"a"=>{:count=>4, :sum=>20, :average=>5.0},
# "b"=>{:count=>4, :sum=>22, :average=>5.5}}
Obviously average is just recalculated on every iteration but since you asked for them at the same time this is probably as close as you are going to get.
Using your "output" instead looks like
output.each_with_object(accumulator) do |h,obj|
key = h['job']
obj[key][:count] += 1
obj[key][:sum] += h['pay']
obj[key][:average] = (obj[key][:sum] / obj[key][:count].to_f)
end
#=> {"a"=>{:count=>4, :sum=>20, :average=>5.0},
# "b"=>{:count=>4, :sum=>22, :average=>5.5}}

as Sara Tibbetts comment suggests, my first step would be to convert it like this
new_a = a.reduce({}){ |memo, item| memo[item[0]] ||= []; memo[item[0]] << item[1]; memo}
which puts it in this format
{a: [2, 4, 6, 8], b: [2, 4, 6, 20]}
you can then use slice to filter the keys you want
new_a.slice!(key1, key2, ...)
Then do another pass through to do get the final format
new_a.reduce([]) do |memo, (k,v)|
avg = v.inject{ |sum, el| sum + el }.to_f / v.size
memo << [k,avg]
memo
end

I elected to use Enumerable#each_with_object with the object being an array of two hashes, the first to compute totals, the second to count the number of numbers that are totalled. Each hash is defined Hash.new(0), zero being the default value. See Hash::new for a fuller explanation, In short, if a hash defined h = Hash.new(0) does not have a key k, h[k] returns 0. (h is not modified.) h[k] += 1 expands to h[k] = h[k] + 1. If h does not have a key k, h[k] on the right of the equality returns 0.1
output =
[{"name"=>"aaa", "job"=>"a", "pay"=> 2},
{"name"=>"zzz", "job"=>"a", "pay"=> 4},
{"name"=>"xxx", "job"=>"a", "pay"=> 6},
{"name"=>"yyy", "job"=>"a", "pay"=> 8},
{"name"=>"aaa", "job"=>"b", "pay"=> 2},
{"name"=>"zzz", "job"=>"b", "pay"=> 4},
{"name"=>"xxx", "job"=>"b", "pay"=> 6},
{"name"=>"yyy", "job"=>"b", "pay"=>10}
]
htot, hnbr = output.each_with_object([Hash.new(0), Hash.new(0)]) do |f,(g,h)|
s = f["job"]
g[s] += f["pay"]
h[s] += 1
end
htot.merge(hnbr) { |k,o,n| o.to_f/n }.to_a
#=> [["a", 5.0], ["b", 5.5]]
If .to_a at the end is dropped the the hash {"a"=>5.0, "b"=>5.5} is returned. The OP might find that more useful than the array.
I've used the form of Hash#merge that uses a block to determine the values of keys that are present in both hashes being merged.
Note that htot={"a"=>20, "b"=>22} and hnbr=>{"a"=>4, "b"=>4}.
1 If the reader is wondering why h[k] on the left of = doesn't return zero as well, it's a different method: Hash#[]= versus Hash#[]

Rails 3. How to get the difference between two arrays?

Let’s say I have this array with shipments ids.
s = Shipment.find(:all, :select => "id")
[#<Shipment id: 1>, #<Shipment id: 2>, #<Shipment id: 3>, #<Shipment id: 4>, #<Shipment id: 5>]
Array of invoices with shipment id's
i = Invoice.find(:all, :select => "id, shipment_id")
[#<Invoice id: 98, shipment_id: 2>, #<Invoice id: 99, shipment_id: 3>]
Invoices belongs to Shipment.
Shipment has one Invoice.
So the invoices table has a column of shipment_id.
To create an invoice, I click on New Invoice, then there is a select menu with Shipments, so I can choose "which shipment am i creating the invoice for". So I only want to display a list of shipments that an invoice hasn't been created for.
So I need an array of Shipments that don't have an Invoice yet. In the example above, the answer would be 1, 4, 5.

a = [2, 4, 6, 8]
b = [1, 2, 3, 4]
a - b | b - a # => [6, 8, 1, 3]

First you would get a list of shipping_id's that appear in invoices:
ids = i.map{|x| x.shipment_id}
Then 'reject' them from your original array:
s.reject{|x| ids.include? x.id}
Note: remember that reject returns a new array, use reject! if you want to change the original array

Use substitute sign
irb(main):001:0> [1, 2, 3, 2, 6, 7] - [2, 1]
=> [3, 6, 7]

Ruby 2.6 is introducing Array.difference:
[1, 1, 2, 2, 3, 3, 4, 5 ].difference([1, 2, 4]) #=> [ 3, 3, 5 ]
So in the case given here:
Shipment.pluck(:id).difference(Invoice.pluck(:shipment_id))
Seems a nice elegant solution to the problem. I've been a keen follower of a - b | b - a, though it can be tricky to recall at times.
This certainly takes care of that.

Pure ruby solution is
(a + b) - (a & b)
([1,2,3,4] + [1,3]) - ([1,2,3,4] & [1,3])
=> [2,4]
Where a + b will produce a union between two arrays
And a & b return intersection
And union - intersection will return difference

The previous answer here from pgquardiario only included a one directional difference. If you want the difference from both arrays (as in they both have a unique item) then try something like the following.
def diff(x,y)
o = x
x = x.reject{|a| if y.include?(a); a end }
y = y.reject{|a| if o.include?(a); a end }
x | y
end

This should do it in one ActiveRecord query
Shipment.where(["id NOT IN (?)", Invoice.select(:shipment_id)]).select(:id)
And it outputs the SQL
SELECT "shipments"."id" FROM "shipments" WHERE (id NOT IN (SELECT "invoices"."shipment_id" FROM "invoices"))
In Rails 4+ you can do the following
Shipment.where.not(id: Invoice.select(:shipment_id).distinct).select(:id)
And it outputs the SQL
SELECT "shipments"."id" FROM "shipments" WHERE ("shipments"."id" NOT IN (SELECT DISTINCT "invoices"."shipment_id" FROM "invoices"))
And instead of select(:id) I recommend the ids method.
Shipment.where.not(id: Invoice.select(:shipment_id).distinct).ids

When dealing with arrays of Strings, it can be useful to keep the differences grouped together.
In which case, we can use Array#zip to group the elements together and then use a block to decide what to do with the grouped elements (Array).
a = ["One", "Two", "Three", "Four"]
b = ["One", "Not Two", "Three", "For" ]
mismatches = []
a.zip(b) do |array|
mismatches << array if array.first != array.last
end
mismatches
# => [
# ["Two", "Not Two"],
# ["Four", "For"]
# ]

s.select{|x| !ids.include? x.id}

Ruby/Rails: get elements from array where indices are divisible by x

How could I implement this? I think my solution is very dirty, and I would like to do it better. I think there is an easy way to do this in Ruby, but I can't remember. I want to use it with Rails, so if Rails provides something similar that's ok, too. usage should be like this:
fruits = ['banana', 'strawberry', 'kiwi', 'orange', 'grapefruit', 'lemon', 'melon']
# odd_fruits should contain all elements with odd indices (index % 2 == 0)
odd_fruits = array_mod(fruits, :mod => 2, :offset => 0)
# even_fruits should contain all elements with even indices (index % 2 == 1)
even_fruits = array_mod(fruits, :mod => 2, :offset => 1)
puts odd_fruits
banana
kiwi
grapefruit
melon
puts even_fruits
strawberry
orange
lemon
******* EDIT *******
for those wo want to know, that is what i finally did:
in a rails project, i created a new file config/initializers/columnize.rb which looks like this:
class Array
def columnize args = { :columns => 1, :offset => 0 }
column = []
self.each_index do |i|
column << self[i] if i % args[:columns] == args[:offset]
end
column
end
end
Rails automatically loads these files immediately after Rails has been loaded. I also used the railsy way of supplying arguments to a method, because i think that serves the purpose of better readable code, and i'm a good-readable-code-fetishist :) I extended the core class "Array", and now i can do things like the following with every array in my project:
>> arr = [1,2,3,4,5,6,7,8]
=> [1, 2, 3, 4, 5, 6, 7, 8]
>> arr.columnize :columns => 2, :offset => 0
=> [1, 3, 5, 7]
>> arr.columnize :columns => 2, :offset => 1
=> [2, 4, 6, 8]
>> arr.columnize :columns => 3, :offset => 0
=> [1, 4, 7]
>> arr.columnize :columns => 3, :offset => 1
=> [2, 5, 8]
>> arr.columnize :columns => 3, :offset => 2
=> [3, 6]
I will now use it to display entries from the database in different columns in my views. What i like about it, is that i don't have to call any compact methods or stuff, because rails complains when you pass a nil object to a view. now it just works. I also thought about letting JS do all that for me, but i like it better this way, working with the 960 Grid system (http://960.gs)

fruits = ["a","b","c","d"]
even = []
x = 2
fruits.each_index{|index|
even << fruits[index] if index % x == 0
}
odds = fruits - even
p fruits
p even
p odds
["a", "b", "c", "d"]
["a", "c"]
["b", "d"]

def array_mod(arr, mod, offset = 0)
arr.shift(offset)
out_arr = []
arr.each_with_index do |val, idx|
out_arr << val if idx % mod == 0
end
out_arr
end
Usage:
>> fruits = ['banana', 'strawberry', 'kiwi', 'orange', 'grapefruit', 'lemon', 'melon']
>> odd_fruits = array_mod(fruits, 2)
=> ["banana", "kiwi", "grapefruit", "melon"]
>> even_fruits = array_mod(fruits, 2, 1)
=> ["strawberry", "orange", "lemon"]
>> even_odder_fruits = array_mod(fruits, 3, 2)
=> ["kiwi", "lemon"]

The simplest method I can think of is this:
fruits = ["a","b","c","d"]
evens = fruits.select {|x| fruits.index(x) % 2 == 0}
odds = fruits - evens
You don't need to mess with select_with_index if the array can look up indices for you. I suppose the drawback to this method is if you have multiple entries in 'fruits' with the same value (the index method returns the index of the first matching entry only).

What you want is:
even_fruits = fruits.select_with_index { |v,i| i % 2 == 0) }
odd_fruits = fruits - even_fruits
Unfortunately Enumerable#select_with_index does not exist as standard, but several people have extended Enumerable with such a method.
http://snippets.dzone.com/posts/show/3746
http://webget.com/gems/webget_ruby_ramp/doc/Enumerable.html#M000058

Solution using just core capabilities:
(0...((fruits.size+1-offset)/mod)).map {|i| fruits[i*mod+offset]}

Rails provides an ActiveSupport extension to Array that provides an "in_groups_of" method. That's what I usually use for things like this. It allows you to do this:
to pull the even fruits (remember to compact to pull off nils at the end):
fruits = ['banana', 'strawberry', 'kiwi', 'orange', 'grapefruit', 'lemon', 'melon']
fruits.in_groups_of(2).collect{|g| g[1]}.compact
=> ["strawberry", "orange", "lemon"]
to pull the odd fruits:
fruits.in_groups_of(2).collect{|g| g[0]}.compact
=> ["banana", "kiwi", "grapefruit", "melon"]
to get every third fruit, you could use:
fruits.in_groups_of(3).collect{|g| g[0]}.compact
=> ["banana", "orange", "melon"]

functional way
#fruits = [...]
even = []
odd = []
fruits.inject(true ){|_is_even, _el| even << _el if _is_even; !_is_even}
fruits.inject(false){|_is_odd, _el| odd << _el if _is_odd; !_is_odd }

Here's a solution using #enum_for, which allows you to specify a method to use "in place" of #each:
require 'enumerator'
mod = 2
[1, 2, 3, 4].enum_for(:each_with_index).select do |item, index|
index % mod == 0
end.map { |item, index| item }
# => [1, 2]

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Find and update a collection in JSONB - ruby-on-rails

Related

How to filter a number of records and get only the outer most records from a postgres ltree structure?

Standardizing two Hashes to include same keys

Sum and Average Array of Array

Rails 3. How to get the difference between two arrays?

Ruby/Rails: get elements from array where indices are divisible by x

Categories

Resources