Iterating over Sets of ActiveRecord Query Batch - ruby-on-rails

I have a table that has a few thousand sets of 2-3 nearly identical records, that all share a unique "id" (not database ID, but item id). That is, two to three records share the same item id and there are about 2100 records, or ~700 unique items. Example:
{id: 1, product_id:333, is_special:true}, {id:2, product_id:333, is_special:false}, {id:3, product_id:333, is_special:false}, {id:4, product_id:334, is_special:false}...
I'd like to perform a query that lets me iterate over each set, modify/remove duplicate records, then move on to the next set.
This is what I currently have:
task find_averages: :environment do
responses = Response.all
chunked_responses = responses.chunk_while { |a,b| a.result_id == b.result_id}.to_a
chunked_responses.each do |chunk|
if chunk.length < 3
chunk.each do |chunky_response|
chunky_response.flagged = true
chunky_response.save
end
else
chunk.each do |chunky_response|
**manipulate each item in the chunk here**
end
end
end
end
Edit. I worked this one out after discovering the chunk_while method. I am not positive chunk_while is the most efficient method here, but it works well. I am closing this, but anyone else that needs to group records and then iterate over them, this should help.

The following code should iterate over an array of items that some of them share common values, and group them by common values:
responses = Responses.all
chunked_responses = responses.chunk_while { |a,b| a.result_id == b.result_id}.to_a
chunked_responses.each do |chunk|
chunk.each do |chunky_response|
**manipulate each item in the chunk here**
end
end

Related

Read large XLS and put into Array is very slow

I have a large XLS file with postal codes, the problem: is a quite slow to read the data, for example, the file have multiple sheets with the state name, into each sheet they are a multiple rows with postal code, neighborhoods and municipality. The file have 33 states, each state have between 1000 and 9000 rows.
I try to parse this to an array of hashes, which one take 22 seconds. Is there any way to read this faster?
This is how I read the sheet
def read_sheet(sheet_name:, offset: 1)
sheet = file.worksheet sheet_name[0..30]
clean_data = sheet.each_with_index(offset)
.lazy
.reject{|k, _| !k.any?}
data = clean_data.map do |row, _index|
DATA_MAPPING.map do |field, column|
{ field => row[column] }
end.inject(:merge)
end
yield data
end
And I retrieve all with
def read_file
result = {}
sheets_titles.each_with_index do |name|
read_sheet(sheet_name: name) do |data|
result.merge! name => data.to_a
end
end
result
end
So, if I use .to_a or .to_json or any method to process the data and insert to DB, I have to wait few seconds ... any suggestion?

Iterate through parameters array and ActiveRecord:Relation

I have a controller which needs to implement bulk update. (However, it needs to set specific values to each object vs the same value for all objects).
Here is the array which the controller will get
[
{
"task_id": 1,
"some_property": "value1"
},
{
"task_id": 2,
"some_property": "value2"
},
]
I need to find all tasks and for each task update the property to a provided value.
The obvious solution is
task_ids = params[::_json].map { |task| task[:task_id] }
tasks = Task.where(id: task_ids)
tasks.each do |task|
params[::_json].each do |task_from_params| do
if task.id == task_form_params[:task_id]
task.some_property = task_form_params[:some_property]
task.save!
end
end
end
The thing which I don't like (big time) is the code where we do N^2 comparisons.
I am pretty sure there should be a better way to do in Rails. I am looking for something more concise which doesn't require N^2 comparisons.
Option 1: ActiveRecord::Relation#update_all
If you don't care about validations and callbacks then you can simply use:
params.each do |param|
Task.where(id: param.fetch('task_id')).
update_all(some_property: param.fetch('some_property')
end
This will iterate over N items in params and issue N UPDATEs and no SELECTs.
Option 2: Convert to a hash mapping ID to property
If you do care about validations or callbacks then you can convert your input array to a hash first:
# Convert params to a hash. Iterate over N items.
id_to_property = params.map do |param|
[param.fetch('task_id'), param.fetch('some_property')]
end.to_h
# Load corresponding tasks. Iterate over N keys.
Task.where(id: id_to_property.keys).find_each do |task|
# Look up the property value - O(1).
task.update!(some_property: id_to_property[task.id])
end

Rails: trying to store particular column value in an array

I am building an application in rails, and I have an items_controller which contains the methods application for create, show, edit, destroy etc.
However, I am trying to create my own method to access all the values in a specific column of my database and I am having greatly difficulty in capturing this data in an array.
I have tried the following ways of capturing the data (where 'quantity' is the column in the database for which I looking for):
#items = Item.find(params[:id])
#items2 = #item.find(params[:quantity])
I have also tried:
#items = Item.find(params[:quantity])
& even:
#items = Item.all
#items2 = #item.find(params[:quantity])
However, none of these methods appear to be working. For what I am doing it is not even essential to
know which quantity column values relate to which row...just getting a list of the column values would suffice.
If any one knows what is wrong here, the help you be very greatly appreciated!
Thanks.
UPDATE:
For clarity I am trying to retrieve all the data for a particular column in my database associated with the items_controller and filter the data for a particular piece of data (in this case the string "7" - as the data is returned from the db as a string when using the Items.all method.
I then want a counter to increase each time the "7" is encountered in the quantity column.
def chartItems
#items = Item.find(params[:id])
#items2 = #items.find(params[:quantity])
#filter = custom_filter_for(#items2)
def custom_filter_for(value)
j=0 # counter initialised at 0
value.each do |x|
if x == "7" # checking for data equal to "7" - the num is retrieved as a string
j = j+1 # increase j counter by 1 whenever "7" is encountered as a quantity
end
return j
end
end
Your find parameter is handled as an id in this case:
#items = Item.find(params[:quantity])
All items are returned which has the id of your quantity parameter. This is clearly not what you want.
You can select Items based on quantity:
#items = Item.find_by_quantity(params[:quantity])
But if you need only the quantities in an array, this is what you are looking for:
#quantities = Items.select(:quantity).map(&:quantity)
Your updated question:
result = Items.find_by_quantity(params[:quantity]).count
In new versions of ActiveRecord, they've added the pluck which does essentially what #Matzi's select and map method does.
To get all item quantities, you could do
#quantities = Item.pluck(:quantity)
Also, I would double check your use of the find_by helpers. I think that find_by_quantity will only give you a single match back (#item, not #items). To get all, I think you really want to use where
#quantities = Item.where(:quantity => params[:quantity])
If you were to use the pluck I mentioned above, I think your filtering step could also be written pretty concisely. That filter is simply counting the number of 7's in the list, right?
#quantities = Item.pluck(:quantity)
#filtered_values = #quantities.select{|q| q == 7}.length
I hope this helps out.

Ruby on Rails - Most Efficient Way to Compare Two Large Arrays for non-matching results

I have some legacy code in my app that compares all table rows for a model with the result of an API call from a supplier.
This worked fine until last week where both the number of table rows and the number of results by the supplier increased massively.
The table cardinality has went from 1,657 to 59,699 and the number of results returned by the API is ~ 150,000.
What the code is doing is looking in the API results to check that if the current table row data is not found, if so then the current data is orphaned in the database since it exists there but not in what the supplier has given us.
Looking through 150,000 results to check if something isn't there doesn't sound particularly clever to me and that looks to be the case as I don't even know how long this takes to run as the view is still loading after about half an hour :/
Controller
#telco_numbers = TelcoNumber.orphaned_in_db
Model
def self.orphaned_in_db
db_numbers = self.find(:all)
listed_numbers = self.all_telco
orphaned_numbers = []
db_numbers.each do |db|
scan = listed_numbers.select{ |l| l.number == db.number}
orphaned_numbers.push(db) if scan.empty?
end
return orphaned_numbers
end
def self.some_telco(per_page, page = 1)
page = 1 if page.nil?
# this is the first api call which returns a link which is then used for the next api call
api_call = TelcoApiv3.new("post", "/numbers/#{TelcoApiv3.account_id}/allocated/all")
listed_numbers = TelcoApiv3.poll(api_call.response["link"])
return listed_numbers.collect do |ln|
ln.store("countrycode", ln["country_code"])
TelcoNumber.new ln
end
end
def self.all_telco(page = 1)
listed_numbers = some_telco(##max_nlist_results, page)
if listed_numbers.length == ##max_nlist_results
return listed_numbers.concat(all_telco(page + 1))
else
return listed_numbers
end
end
Example API result format:
[{"country_code":"44","number":"1133508889"},....
The number relates to the number column in the table for the model. (It is stored as a varchar and not as a number).
Also, the api results are returned in ascending number order so are already sorted so I would have thought that would have made things better than they are?
Why are you not trying Array difference. First make two arrays of db_numbers & listed_numbers and subtract the smaller array from the bigger one like this:
def self.orphaned_in_db
db_numbers = self.find(:all).map{|x| x.number}
listed_numbers = self.all_telco.map{|x| x.number}
orphaned_numbers = db_numbers - listed_numbers
orphaned_results = self.find(orphaned_numbers)
return orphaned_results
end
When I will subtract the listed_numbers from db_numbers, I will get the non-matching results set. and now you can find the results on the basis of orphaned_numbers in your database. It will be much faster. Thanks

Get columns names with ActiveRecord

Is there a way to get the actual columns name with ActiveRecord?
When I call find_by_sql or select_all with a join, if there are columns with the same name, the first one get overridden:
select locations.*, s3_images.* from locations left join s3_images on s3_images.imageable_id = locations.id and s3_images.imageable_type = 'Location' limit 1
In the example above, I get the following:
#<Location id: 22, name: ...
>
Where id is that of the last s3_image. select_rows is the only thing that worked as expected:
Model.connection.select_rows("SELECT id,name FROM users") => [["1","amy"],["2","bob"],["3","cam"]]
I need to get the field names for the rows above.
This post gets close to what I want but looks outdated (fetch_fields doesn't seem to exist anymore How do you get the rows and the columns in the result of a query with ActiveRecord? )
The ActiveRecord join method creates multiple objects. I'm trying to achieve the same result "includes" would return but with a left join.
I am attempting to return a whole lot of results (and sometimes whole tables) this is why includes does not suit my needs.
Active Record provides a #column_names method that returns an array of column names.
Usage example: User.column_names
two options
Model.column_names
or
Model.columns.map(&:name)
Example
Model named Rabbit with columns name, age, on_facebook
Rabbit.column_names
Rabbit.columns.map(&:name)
returns
["id", "name", "age", "on_facebook", "created_at", "updated_at"]
This is just way active record's inspect method works: it only lists the column's from the model's table. The attributes are still there though
record.blah
will return the blah attribute, even if it is from another table. You can also use
record.attributes
to get a hash with all the attributes.
However, if you have multiple columns with the same name (e.g. both tables have an id column) then active record just mashes things together, ignoring the table name.You'll have to alias the column names to make them unique.
Okay I have been wanting to do something that's more efficient for a while.
Please note that for very few results, include works just fine. The code below works better when you have a lot of columns you'd like to join.
In order to make it easier to understand the code, I worked out an easy version first and expanded on it.
First method:
# takes a main array of ActiveRecord::Base objects
# converts it into a hash with the key being that object's id method call
# loop through the second array (arr)
# and call lamb (a lambda { |hash, itm| ) for each item in it. Gets called on the main
# hash and each itm in the second array
# i.e: You have Users who have multiple Pets
# You can call merge(User.all, Pet.all, lambda { |hash, pet| hash[pet.owner_id].pets << pet }
def merge(mainarray, arr, lamb)
hash = {}
mainarray.each do |i|
hash[i.id] = i.dup
end
arr.each do |i|
lamb.call(i, hash)
end
return hash.values
end
I then noticed that we can have "through" tables (nxm relationships)
merge_through! addresses this issue:
# this works for tables that have the equivalent of
# :through =>
# an example would be a location with keywords
# through locations_keywords
#
# the middletable should should return as id an array of the left and right ids
# the left table is the main table
# the lambda fn should store in the lefthash the value from the righthash
#
# if an array is passed instead of a lefthash or a righthash, they'll be conveniently converted
def merge_through!(lefthash, righthash, middletable, lamb)
if (lefthash.class == Array)
lhash = {}
lefthash.each do |i|
lhash[i.id] = i.dup
end
lefthash = lhash
end
if (righthash.class == Array)
rhash = {}
righthash.each do |i|
rhash[i.id] = i.dup
end
righthash = rhash
end
middletable.each do |i|
lamb.call(lefthash, righthash, i.id[0], i.id[1])
end
return lefthash
end
This is how I call it:
lambmerge = lambda do |lhash, rhash, lid, rid|
lhash[lid].keywords << rhash[rid]
end
Location.merge_through!(Location.all, Keyword.all, LocationsKeyword.all, lambmerge)
Now for the complete method (which makes use of merge_through)
# merges multiple arrays (or hashes) with the main array (or hash)
# each arr in the arrs is a hash, each must have
# a :value and a :proc
# the procs will be called on values and main hash
#
# :middletable will merge through the middle table if provided
# :value will contain the right table when :middletable is provided
#
def merge_multi!(mainarray, arrs)
hash = {}
if (mainarray.class == Hash)
hash = mainarray
elsif (mainarray.class == Array)
mainarray.each do |i|
hash[i.id] = i.dup
end
end
arrs.each do |h|
arr = h[:value]
proc = h[:proc]
if (h[:middletable])
middletable = h[:middletable]
merge_through!(hash, arr, middletable, proc)
else
arr.each do |i|
proc.call(i, hash)
end
end
end
return hash.values
end
Here's how I use my code:
def merge_multi_test()
merge_multi!(Location.all,
[
# each one location has many s3_images (one to many)
{ :value => S3Image.all,
:proc => lambda do |img, hash|
if (img.imageable_type == 'Location')
hash[img.imageable_id].s3_images << img
end
end
},
# each location has many LocationsKeywords. Keywords is the right table and LocationsKeyword is the middletable.
# (many to many)
{ :value => Keyword.all,
:middletable => LocationsKeyword.all,
:proc => lambda do |lhash, rhash, lid, rid|
lhash[lid].keywords << rhash[rid]
end
}
])
end
You can modify the code if you wish to lazy load attributes that are one to many (such as a City is to a Location) Basically, the code above won't work because you'll have to loop through the main hash and set the city from the second hash (There is no "city_id, location_id" table). You could reverse the City and Location to get all the locations in the city hash then extract back. I don't need that code yet so I skipped it =)

Resources