I have the following code snippet that works perfectly and as intended:
# Prepares the object design categories and connects them via bit mapping with the objects.design_category_flag
def prepare_bit_flag_positions
# Updates the bit_flag_position and the corresponding data in the object table with one transaction
ActiveRecord::Base.transaction do
# Sets the bit flag for object design category
ObjectDesignCategory.where('0 = (#rownum:=0)').update_all('bit_flag_position = 1 << (#rownum := 1 + #rownum)')
# Resets the object design category flag
Object.update_all(design_category_flag: 0)
# Sets the new object design category bit flag
object_group_relation = Object.joins(:object_design_categories).select('BIT_OR(bit_flag_position) AS flag, objects.id AS object_id').group(:id)
join_str = "JOIN (#{object_group_relation.to_sql}) sub ON sub.object_id = objects.id"
Object.joins(join_str).update_all('design_category_flag = sub.flag')
end
But in my opinion it is quite difficult to read. So I tried to rewrite this code without raw SQL. What I created was this:
def prepare_bit_flag_positions
# Updates the bit_flag_position and the corresponding data in the object table with via transaction
ActiveRecord::Base.transaction do
# Sets the bit flag for the object color group
ObjectColorGroup.find_each.with_index do |group, index|
group.update(bit_flag_position: 1 << index)
end
# Resets the object color group flag
Object.update_all(color_group_flag: 0)
# Sets the new object color group bit flag
Object.find_each do |object|
object.update(color_group_flag: object.object_color_groups.sum(:bit_flag_position))
end
end
end
This also works fine, but when I run a benchmark for about 2000+ records, the second option is about a factor of 65 slower than the first. So my question is:
Does anyone have an idea how to redesign this code so that it doesn't require raw SQL and is still fast?
I can see 2 sources of slowing:
N+1 problem
Instantiating objects
Calls to DB
This code has the N+1 Problem. I think this may be the major cause of the slowing.
Object.find_each do |object|
object.update(color_group_flag: object.object_color_groups.sum(:bit_flag_position))
end
Change to
Object.includes(:object_color_groups).find_each do |object|
...
end
You can also use Object#update class method on this code (see below).
I don't think you can get around #2 without using raw SQL. But, you will need many objects (10K or 100K or more) to see a big difference.
To limit the calls to the DB, you can use Object#update class method to update many at once.
ObjectColorGroup.find_each.with_index do |group, index|
group.update(bit_flag_position: 1 << index)
end
to
color_groups = ObjectColorGroup.with_index.map do |group, index|
[group.id, { bit_flag_position: group.bit_flag_position: 1 << index }]
end.to_h
ObjectColorGroup.update(color_groups.keys, color_groups.values)
The following is a single query, so no need to change.
Object.update_all(color_group_flag: 0)
Reference:
ActiveRecord#update class method API
ActiveRecord#update class method blog post
Rails Eager Loading
Related
Here's the situation:
I have an Event model and I want to add prev / next buttons to a view to get the next event, but sorted by the event start datetime, not the ID/created_at.
So the events are created in the order that start, so I can compare IDs or get the next highest ID or anything like that. E.g. Event ID 2 starts before Event ID 3. So Event.next(3) should return Event ID 2.
At first I was passing the start datetime as a param and getting the next one, but this failed when there were 2 events with the same start. The param start datetime doesn't include microseconds, so what would happen is something like this:
order("start > ?",current_start).first
would keep returning the same event over and over because current_start wouldn't include microseconds, so the current event would technically be > than current_start by 0.000000124 seconds or something like that.
The way I got to work for everything was with a concern like this:
module PrevNext
extend ActiveSupport::Concern
module ClassMethods
def next(id)
find_by(id: chron_ids[current_index(id)+1])
end
def prev(id)
find_by(id: chron_ids[current_index(id)-1])
end
def chron_ids
#chron_ids ||= order("#{order_by_attr} ASC").ids
end
def current_index(id)
chron_ids.find_index(id)
end
def order_by_attr
#order_by_attr ||= 'created_at'
end
end
end
Model:
class Event < ActiveRecord::Base
...
include PrevNext
def self.order_by_attr
#order_by_attr ||= "start_datetime"
end
...
end
I know pulling all the IDs into an array is bad and dumb* but i don't know how to
Get a list of the records in the order I want
Jump to a specific record in that list (current event)
and then get the next record
...all in one ActiveRecord query. (Using Rails 4 w/ PostgreSQL)
*This table will likely never have more than 10k records, so it's not catastrophically bad and dumb.
The best I could manage was to pull out only the IDs in order and then memoize them.
Ideally, i'd like to do this by just passing the Event ID, rather than a start date params, since it's passed via GET param, so the less URL encoding and decoding the better.
There has to be a better way to do this. I posted it on Reddit as well, but the only suggested response didn't actually work.
Reddit Link
Any help or insight is appreciated. Thanks!
You can get the next n records by using the SQL OFFSET keyword:
china = Country.order(:population).first
india = City.order(:population).offset(1).take
# SELECT * FROM countries ORDER BY population LIMIT 1 OFFSET 1
Which is how pagination for example often is done:
#countries = Country.order(:population).limit(50)
#countries = scope.offset( params[:page].to_i * 50 ) if params[:page]
Another way to do this is by using would be query cursors. However ActiveRecord does not support this and it building a generally reusable solution would be quite a task and may not be very useful in the end.
I have some code that is chugging through a set of Rails Active Record models, and setting an attribute based on a related value from a 2D Array.
I am essentially setting a US State abbreviation code in a table of US States which was previously only storing the full names. A library of state names is being used to derive the abbreviations, and it contains a 2D Array with each sub-array having a full name, and an abbreviation (i.e., [['New York', 'NY']['Pennsylvania', 'PA'][etc]]). I compare the state name from each record in the database to each full text name in this Array, then grab the corresponding sibling Array cell when there is a match.
This code works fine, and produces the correct results, but its frumpy looking and not easily understood without reading many lines:
# For the following code, StatesWithNames is an Active Record model, which is
# having a new column :code added to its table.
# Sates::USA represents a 2D Array as: [['StateName', 'NY']], and is used to
# populate the codes for StatesWithNames.
# A comparison is made between StatesWithNames.name and the text name found in
# States::USA, and if there is a match, the abbreviation from States::USA is
# used
if StatesWithNames.any?
StatesWithNames.all.each do |named_state|
if named_state.code.blank?
States::USA.each do |s|
if s[0] == named_state.name
named_state.update_column(:code, s[1])
break
end
end
end
end
end
What is the most Ruby style way of expressing assignments with logic like this? I experimented with a few different procs / blocks, but arrived at even cludgier expressions, or incorrect results. Is there a more simple way to express this in fewer lines and/or if-end conditionals?
Yea, there is a few ifs and checks, that are not needed.
Since it is Rails even though it does not state so in question's tags, you might want to use find_each, which is one of the most efficient way to iterate over a AR collection:
StatesWithNames.find_each do |named_state|
next unless named_state.code.blank?
States::USA.each do |s|
named_state.update_column(:code, s[1]) if s[0] == named_state.name
end
end
Also be aware, that update_column bypasses any validations, and if you wish to keep your objects valid, stick to update!.
And last thing - wrap it all in transaction, so if anything goes wrong all the way - it would rollback any changes.
StatesWithNames.transaction do
StatesWithNames.find_each do |named_state|
next unless named_state.code.blank?
States::USA.each do |s|
named_state.update!(:code, s[1]) if s[0] == named_state.name
end
end
end
You might use a different data structure for this.
With your existing 2D array, you can call to_h on it to get a Hash where
a = [['California', 'CA'], ['Oregon', 'OR']].to_h
=> { 'California' => 'CA', 'Oregon' => 'OR' }
Then in your code you can do
state_hash = States::USA.to_h
if StatesWithNames.any?
StatesWithNames.all.each do |named_state|
if named_state.code.blank?
abbreviation = state_hash[named_state.name]
if !abbreviation.nil?
named_state.update_column(:code, abbreviation)
end
end
end
end
the first thing you want to do is convert the lookup from an array of arrays to a hash.
state_hash = States::USA.to_h
if StatesWithNames.any?
StatesWithNames.all.select{|state| state.code.blank?}.each do |named_state|
named_state.update_column(:code, state_hash[named_state.name]) if state_hash[named_state.name]
end
end
Ok so I have an app that allows users to pull App Store data, specifically top free top paid etc. The various attributes are quite limited, but users can filter by category and country. So obviously this leads to a lot of repeated queries, now normally this wouldn't be a problem, but I also use this data with google api which has a credits system. So What I want to do is save these results in my database if the results are unique. I have this all set up and fine but my only hang up is how I determine if a query has been made before, so my solution is to make a hashtable that stores all queries that have been made before and if not NULL(nil) then I call the api to fetch the data then create a new record.
Issue is the App Store refreshes every day or so(not exactly sure the schedule but will look it up later). I would like to have this Hashtable reference function refresh or reset itself to all NULL at this interval.
What would be the most efficient or simple way to start a refresh for this? Additionally I am kinda new to rails, so where should I place this function? In the helper modules? Controller?
Thanks!
Edit:
ok so here is my HashTable helper module
module MapsHelper
queryHistoryLookUp = {}
i = 0
31.times do |i|
queryTableLookup.merge!(i =>[] )
end
def queryTableLookup(asciiNum, queryString)
if queryTableLookup[asciiNum % 31].size == 0
queryTableLookup[asciiNum % 31].push(queryString)
else
a = queryTableLookup[asciiNum % 31].size
arrayOfQueries = queryTableLookup[asciiNum % 31]
a.times do |i|
if arrayOfQueries[i] == queryString
return true
else
return false
end
end
end
end
end
def queryHash(query)
asciSum = 0
query.each_char do |i|
asciSum += i.sum
end
queryTableLookup(asciSum, query)
end
end
additionally I am kinda new to rails, can I interact with these functions using Javascript, since on the client side I create the string query.
In my opinion, your best bet would be to use the Rails cache system. It provides a method of caching data, with an optional expires_in time.
From the docs:
http://guides.rubyonrails.org/caching_with_rails.html#low-level-caching
class MyModel < ActiveRecord::Base
def self.get_api_data(key)
Rails.cache.fetch("my_model/api_data:#{key}", expires_in: 12.hours) do
SomeService::API.get_data(key)
end
end
end
In your hash (which I think it could exist in a class variable) you can store both the query and the last access datetime:
Suppose you have a hash as class variable to the Foo class with name cache and that the query variable is your current query that you want to check.
if Foo.cache[query].nil? || (DateTime.now - Foo.cache[query].last_fetch).to_i > 0
results = your_method_to_fetch_data_for(query)
Foo.cache[query] = {:results => results, :last_fetch => Datetime.now}
else
results = Foo.cache[query][:results]
end
Given this model:
class User < ActiveRecord::Base
has_many :things
end
Then we can do this::
#user = User.find(123)
#user.things.find_each{ |t| print t.name }
#user.thing_ids.each{ |id| print id }
There are a large number of #user.things and I want to iterate through only their ids in batches, like with find_each. Is there a handy way to do this?
The goal is to:
not load the entire thing_ids array into memory at once
still only load arrays of thing_ids, and not instantiate a Thing for each id
Rails 5 introduced in_batches method, which yields a relation and uses pluck(primary_key) internally. And we can make use of the where_values_hash method of the relation in order to retrieve already-plucked ids:
#user.things.in_batches { |batch_rel| p batch_rel.where_values_hash['id'] }
Note that in_batches has order and limit restrictions similar to find_each.
This approach is a bit hacky since it depends on the internal implementation of in_batches and will fail if in_batches stops plucking ids in the future. A non-hacky method would be batch_rel.pluck(:id), but this runs the same pluck query twice.
You can try something like below, the each slice will take 4 elements at a time and them you can loop around the 4
#user.thing_ids.each_slice(4) do |batch|
batch.each do |id|
puts id
end
end
It is, unfortunately, not a one-liner or helper that will allow you to do this, so instead:
limit = 1000
offset = 0
loop do
batch = #user.things.limit(limit).offset(offset).pluck(:id)
batch.each { |id| puts id }
break if batch.count < limit
offset += limit
end
UPDATE Final EDIT:
I have updated my answer after reviewing your updated question (not sure why you would downvote after I backed up my answer with source code to prove it...but I don't hold grudges :)
Here is my solution, tested and working, so you can accept this as the answer if it pleases you.
Below, I have extended ActiveRecord::Relation, overriding the find_in_batches method to accept one additional option, :relation. When set to true, it will return the activerecord relation to your block, so you can then use your desired method 'pluck' to get only the ids of the target query.
#put this file in your lib directory:
#active_record_extension.rb
module ARAExtension
extend ActiveSupport::Concern
def find_in_batches(options = {})
options.assert_valid_keys(:start, :batch_size, :relation)
relation = self
start = options[:start]
batch_size = options[:batch_size] || 1000
unless block_given?
return to_enum(:find_in_batches, options) do
total = start ? where(table[primary_key].gteq(start)).size : size
(total - 1).div(batch_size) + 1
end
end
if logger && (arel.orders.present? || arel.taken.present?)
logger.warn("Scoped order and limit are ignored, it's forced to be batch order and batch size")
end
relation = relation.reorder(batch_order).limit(batch_size)
records = start ? relation.where(table[primary_key].gteq(start)) : relation
records = records.to_a unless options[:relation]
while records.any?
records_size = records.size
primary_key_offset = records.last.id
raise "Primary key not included in the custom select clause" unless primary_key_offset
yield records
break if records_size < batch_size
records = relation.where(table[primary_key].gt(primary_key_offset))
records = records.to_a unless options[:relation]
end
end
end
ActiveRecord::Relation.send(:include, ARAExtension)
here is the initializer
#put this file in config/initializers directory:
#extensions.rb
require "active_record_extension"
Originally, this method forced a conversion of the relation to an array of activrecord objects and returned it to you. Now, I optionally allow you to return the query before the conversion to the array happens. Here is an example of how to use it:
#user.things.find_in_batches(:batch_size=>10, :relation=>true).each do |batch_query|
# do any kind of further querying/filtering/mapping that you want
# show that this is actually an activerecord relation, not an array of AR objects
puts batch_query.to_sql
# add more conditions to this query, this is just an example
batch_query = batch_query.where(:color=>"blue")
# pluck just the ids
puts batch_query.pluck(:id)
end
Ultimately, if you don't like any of the answers given on an SO post, you can roll-your-own solution. Consider only downvoting when an answer is either way off topic or not helpful in any way. We are all just trying to help. Downvoting an answer that has source code to prove it will only deter others from trying to help you.
Previous EDIT
In response to your comment (because my comment would not fit):
calling
thing_ids
internally uses
pluck
pluck internally uses
select_all
...which instantiates an activerecord Result
Previous 2nd EDIT:
This line of code within pluck returns an activerecord Result:
....
result = klass.connection.select_all(relation.arel, nil, bound_attributes)
...
I just stepped through the source code for you. Using select_all will save you some memory, but in the end, an activerecord Result was still created and mapped over even when you are using the pluck method.
I would use something like this:
User.things.find_each(batch_size: 1000).map(&:id)
This will give you an array of the ids.
I have a categories model made with the fantastic awesome_nested set.
I have successfully generated the drag and drop tree and successfully generated the complete hash of this tree using SERIALIZELIST plugin and sent it to the "array" method that I have added to my categories controller. (using jquery and nestedsortables)
The hash from my log looks like so ...
Processing CategoriesController#array
(for 127.0.0.1 at 2010-08-19 23:12:18)
[POST] Parameters:
{"ul"=>{"0"=>{"class"=>"",
"id"=>"category_1",
"children"=>{"0"=>{"class"=>"",
"id"=>"category_4",
"children"=>{"0"=>{"class"=>"",
"id"=>"category_3"}}}}},
"1"=>{"class"=>"", "id"=>"category_2",
"children"=>{"0"=>{"class"=>"",
"id"=>"category_5"},
"1"=>{"class"=>"",
"id"=>"category_6"}}}}}
i'm just having trouble with the sort function.
Awesome nested set does provide a few move functions but I can't seem to get my head around it.
I want to do something like this when the user hits save (btw it does an ajax request and passes the above data correctly)
def array
newlist = params[:ul]
newlist.each_with_index do |id, index, children|
#insert code here for saving the re-ordered array
end
render :nothing => true
end
I hope this is enough information and hope someone can answer this question.
Cheers,
Matenia
----------- UPDATE AND PROGRESS -----------
Since posting this a few days ago, I mucked around with the logger.info in my dev environment to see what was going on behind the scenes.
I ended up writing 2 functions. One to go through the roots of the array and the other to recursively move the children and childrens children into place. But this ends up with too many database calls (there may be no other way to do it though).
the code looks like so ...
def array
# fetch the current tree
#allcategories = Category.all
# assign the sorted tree to a variable
newlist = params[:ul]
# initialize the previous item
previous = nil
#loop through each item in the new list (passed via ajax)
newlist.each_with_index do |array, index|
# get the category id of the item being moved
moved_item_id = array[1][:id].split(/category_/)
# find the object that is being moved (in database)
#current_category = Category.find_by_id(moved_item_id)
# if this is the first item being moved, move it to the root.
unless previous.nil?
#previous_item = Category.find_by_id(previous)
#current_category.move_to_right_of(#previous_item)
else
#current_category.move_to_root
end
# then, if this item has children we need to loop through them
unless array[1][:children].blank?
# unless there are no children in the array, send it to the recursive children function
childstuff(array[1], #current_category)
end
# set previous to the last moved item, for the next round
previous = moved_item_id
end
render :nothing => true
end
def childstuff(node, category)
# find the category that has been passed into the function
#selected_category = Category.find(category)
for child in node[:children]
child_id = child[1][:id].split(/category_/)
child_category = Category.find_by_id(child_id)
child_category.move_to_child_of(#selected_category)
#if this child has children -- run recursion on this function
unless child[1][:children].blank?
childstuff(child[1], child_category)
end
end
end
I hope someone can shed some light on how to make this more efficient and how to reduce the number of database calls. I have thought about writing other functions, but theyre all going to do the same thing.
For this particular project, I don't believe there would be more than 100 different categories. It's not the best way, but it works.
Cheers again,
Matenia
THE FINAL WORKAROUND
I had an issue with the above code where it wasn't saving the children properly.
Here is my latest attempt, which seems to work well.
def array
# assign the sorted tree to a variable
newlist = params[:ul]
# initialize the previous item
previous = nil
#loop through each item in the new list (passed via ajax)
newlist.each_with_index do |array, index|
# get the category id of the item being moved
moved_item_id = array[1][:id].split(/category_/)
# find the object that is being moved (in database)
#current_category = Category.find_by_id(moved_item_id)
# if this is the first item being moved, move it to the root.
unless previous.nil?
#previous_item = Category.find_by_id(previous)
#current_category.move_to_right_of(#previous_item)
else
#current_category.move_to_root
end
# then, if this item has children we need to loop through them
unless array[1][:children].blank?
# NOTE: unless there are no children in the array, send it to the recursive children function
childstuff(array[1], #current_category)
end
# set previous to the last moved item, for the next round
previous = moved_item_id
end
Category.rebuild!
render :nothing => true
end
def childstuff(mynode, category)
# logger.info "node = #{node} caegory = #{category}"
#loop through it's children
for child in mynode[:children]
# get the child id from each child passed into the node (the array)
child_id = child[1][:id].split(/category_/)
#find the matching category in the database
child_category = Category.find_by_id(child_id)
#move the child to the selected category
child_category.move_to_child_of(category)
# loop through the children if any
unless child[1][:children].blank?
# if there are children - run them through the same process
childstuff(child[1], child_category)
end
end
end
still too many database calls, but I guess that's the price to pay for wanting this functionality as it needs to re-record each item in the database.
Hope this helps someone else in need.
Feel free to msg me if anyone wants help with this.
AWESOME NESTED SET + JQUERY DRAG AND DROP + SERIALIZELIST PLUGIN ....
Cheers,
Matenia
see edited question above for final workaround ..I have posted the code on github .. although it may have a few bugs and needs refactoring BADLY!
JQUERY NESTED SORTABLES - DRAG AND DROP - AWESOME NESTED SET
UPDATE: Added rails 3 example to repo with slightly cleaner code
The same issue came up when upgrading a rails 2.3 app to 3.1. In my case, I only wanted to sort one depth (root or not). Here's what I ended up with:
# Fetch all IDs (they will be in order)
ids = params[:sort].collect { |param| param[/^page_(\d+)$/, 1] }
# Remove first item from array, moving it to left of first sibling
prev = Page.find(ids.shift)
prev.move_to_left_of(prev.siblings.first)
# Iterate over remaining IDs, moving to the right of previous item
ids.each_with_index { |id, position| current = Page.find(id); current.move_to_right_of(prev); prev = current }