Group records for data analysis in Rails - ruby-on-rails

I have two tables connected with habtm relation (through a table).
Table1
id : integer
name: string
Table2
id : integer
name: string
Table3
id : integer
table1_id: integer
table2_id: integer
I need to group Table1 records by simmilar records from Table2. Example:
userx = Table1.create()
user1.table2_ids = 3, 14, 15
user2.table2_ids = 3, 14, 15, 16
user3.table2_ids = 3, 14, 16
user4.table2_ids = 2, 5, 7
user5.table2_ids = 3, 5
Result of grouping that I want is something like
=> [ [ [1,2], [3, 14, 15] ], [ [2,3], [3,14, 16] ], [ [ 1, 2, 3, 5], [3] ] ]
Where first array is an user ids second is table2_ids.
I there any possible SQL solution or I need to create some kind of algorithm ?
Updated:
Ok, I have a code that is working like I've said. Maybe someone who can help me will find it useful to understand my idea.
def self.compare
hash = {}
Table1.find_each do |table_record|
Table1.find_each do |another_table_record|
if table_record != another_table_record
results = table_record.table2_ids & another_table_record.table2_ids
hash["#{table_record.id}_#{another_table_record.id}"] = results if !results.empty?
end
end
end
#hash = hash.delete_if{|k,v| v.empty?}
hash.sort_by{|k,v| v.count}.to_h
end
But I can bet that you can imagine how long does it takes to show me an output. For my 500 Table1 records it's something near 1-2 minutes. If I will have more, time will be increased in progression, so I need some elegant solution or SQL query.

Table1.find_each do |table_record|
Table1.find_each do |another_table_record|
...
Above codes have performance issue that you have to query database N*N times, which could be optimized down to one single query.
# Query table3, constructing the data useful to us
# { table1_id: [table2_ids], ... }
records = Table3.all.group_by { |t| t.table1_id }.map { |t1_id, t3_records|
[t1_id, t3_records.map(&:table2_id)]
}.to_h
Then you could do exactly the same thing to records to get the final result hash.
UPDATE:
#AKovtunov You miss understood me. My code is the first step. With records, which have {t1_id: t2_ids} hash, you could do sth like this:
hash = {}
records.each do |t1_id, t2_ids|
records.each do |tt1_id, tt2_ids|
if t1_id != tt1_id
inter = t2_ids & tt2_ids
hash["#{t1_id}_#{tt1_id}"] = inter if !inter.empty?
end
end
end

Related

Rails format result when grouping across multiple columns

I'm trying to add support for multiple groups in my vehicles API. Currently we only support grouping by a single column like this.
Vehicle.group(:fuel_type).count
Which gives me a result like this:
{
"Petrol": 78,
"Diesel": 22
}
When I add multiple groups like this:
Vehicle.group(:fuel_type, :registration_status).count
I get the following result, which isn't as pretty in an API response. Also it's missing the combination Petrol and Exported since the count is 0.
{
"['Diesel', 'Scrapped']": 5,
"['Petrol', 'Registered']": 6,
"['Petrol', 'Scrapped']": 30,
"['Diesel', 'Registered']": 1,
"['Diesel', 'Deregistered']": 11,
"['Petrol', 'Deregistered']": 42,
"['Diesel', 'Exported']": 5
}
I would like it to be formatted like this instead:
{
"Diesel": {
"Scrapped": 5,
"Registered": 1,
"Deregistered": 11,
"Exported": 5
},
"Petrol": {
"Scrapped": 30,
"Registered": 6,
"Deregistered": 42,
"Exported: 0
}
}
Ideally I would like to support n nested groups, where every combination is displayed in every layer eg. even though there are no exported petrol cars, then it should still be included in the response with a count of 0.
I actually posted my question to openAI's ChatGPT and got a working implementation. Here is the snippet incase anyone has a similar issue:
def handle_hash(query)
return query unless query.is_a?(Hash)
result = {}
query.each do |key, value|
if key.is_a?(String)
result[key] = value
else
current = result
key[0...-1].each do |element|
current[element] ||= {}
current = current[element]
end
current[key[-1]] = value
end
end
result
end

SQL using group_by to count users

users = [
{"id":1, "zipcode":"10031"},
{"id":4, "zipcode":"10000"},
{"id":2, "zipcode":"10031"},
{"id":3, "zipcode":"10031"}
]
Hello guys. I need help a that can reduce the process of my code.
I have a users stored datas.
Were i want to achieve this in my first line of code
[
{"id":1, "zipcode":"10000", "users_count": 1},
{"id":2, "zipcode":"10031", "users_count": 3},
]
My code:
user = User.select(:zipcode).group(:zipcode).count(:id)
result of the above code is {"10000"=>1, "10031"=>3}
so i need to get the separation of keys and values
keys = user.keys
values = user.values
make a loop
i = 0
num = keys.length.to_i
zipcodes = []
while i < num do
zipcode = keys[i]
users_count = values[i]
zipcodes[i]= zipcode , users_count
i +=1
end
result if the above code is [[10000, 1], [10031, 3]]
I want to change the result if this code
user = User.select(:zipcode).group(:zipcode).count(:id)
from this
{"10000"=>1, "10031"=>3}
to this
[{"id":1, "zipcode":"10000", "users_count": 1},{"id":2, "zipcode":"10031", "users_count": 3}]
Thank you for any help.
You can modify your query as such,
user_counts = User.select(:zipcode, 'count(*) as user_count').group(:zipcode)
Please note that in the result that is printed on the console, the user_count will not be displayed, as it is not an attribute of the user model. However it is there, and can be accessed via,
user_counts.each do |user_count|
puts user_count.zip_code
puts user_count['user_count']
end
You can also view all the results by converting them to JSON
user_counts = User.select(:zipcode, 'count(*) as user_count').group(:zipcode)
puts user_counts.as_json

Rails mulitple AR operations (min, max, avg.. ) on same query object

I want to perform multiple calculations using one query to price_histories table, and finally render some statistics using those prices like average, minimum and maximum etc.
price_histories_controller.rb
price_stats = PriceHistory.where('created_at >= ? AND cast(item_id as integer) = ?', 1.day.ago, params['item_id'])
avg_array = price_stats.group(:day).average(:price).to_a
min_array = price_stats.group(:day).min(:price).to_a
max_array = price_stats.group(:day).max(:price).to_a
count_array = price_stats.group(:day).count(:price).to_a
This is the relevant code that causes the error, i'd like to perform some calculations on a set of grouped data but after the first calculation is done, I keep getting
TypeError (no implicit conversion of Symbol into Integer)
Ideally I would end up with an object like this one to be rendered:
#all_stats = {
average: avg_array,
min: min_array,
max: max_array,
count: count_array
}
render json: #all_stats
This sums up my intentions pretty well, I'm new to ruby and I'd like a solution or a better approach which I'm sure there are.
The following code works fine and I'd like anyone to point me in the right direction to finding out why this works fine and when adding and extra calculation it doesn't:
price_stats = PriceHistory.where('created_at >= ? AND cast(item_id as integer) = ?', 1.day.ago, params['item_id'])
avg_array = price_stats.group(:day).average(:price).to_a
and leads to:
{
"average": [
[
null,
"11666.666666666667"
],
[
"24/4/2019",
"11666.666666666667"
],
[
"24",
"11666.6666666666666667"
],
[
"2051",
"11666.6666666666666667"
]
],
"min": [],
"max": [],
"count": []
}
Other approach:
PriceHistory.select(
"AVG(price) AS average_score,
MIN(price) AS average_min,
MAX(price) AS average_max,
COUNT(*) AS price_count"
).where(
'created_at >= ? AND cast(item_id as integer) = ?',
1.day.ago, params['item_id']
).group(:day)
Error:
ArgumentError (Call `select' with at least one field):
I think this should work:
PriceHistory.where(
'created_at >= ? AND cast(item_id as integer) = ?',
1.day.ago,
params['item_id']
).group(:day).select(
"SUM(price) AS sum_price",
"MAX(price) AS max_price",
"MIN(price) AS min_price",
"AVG(price) AS avg_price",
"day"
)
This will return you an array of records, each which has methods day, sum_price, max_price, min_price, and avg_price.
Note that the names of the SQL functions might be different based on your db

ActiveRecord query from an array of hash

I have an array of hash.
eg) array_of_hash = [ { id: 20, name: 'John' }, { id: 30, name: 'Doe'} ]
I would like to get records which match all the criteria in a particular hash.
So the query I want to get executed is
SELECT persons.* FROM persons WHERE persons.id = 20 AND persons.name = 'John' OR persons.id = 30 AND persons.name = 'Doe'
What is the best way to construct this query from an array of hash?
I think this is okay:
ids = array_of_hash.map { |h| h[:id] }
names = array_of_hash.map { |h| h[:name] }
Person.where(id: ids, name: names)
(although it's not super generic)
other attempt:
people = Person.all
array_of_hash.each do |h|
people = people.where(h)
end
people # => will generate a long long query.
Try to map all conditions to your ActiveRecord model independently and flatten the result array afterwards:
array_of_hash.map{ |where_clause| Person.where(where_clause) }.flatten

Rails - Query table with array of IDs and order based on the array

Let me describe with simple example.
I have a list of numbers:
ex: list = [ 3, 1, 2 ]
and I have a table in DB named Products which have 3 rows with product_id = 1, 2and 3.
Now I need to query Products sort by list values (3,1,2) so the result will be:
product_3
product_1
product_2
Query will be like:
product.sort(list) or product.order(list) or some other alternative pls
This should work for you:
list = [3, 1, 2]
Product.where(product_id: list).sort_by { |p| list.find_index(p.product_id) })

Resources