According to the Rails docs here and here, using update_all does not do the following -
It skips validations
It does not update the updated_at field
It silently ignores the :limit and :order methods
I'm trying to go through my code base and remove instances of update_all, particularly because of the first point.
Is there a way to still have the convenience of update_all and still run validations? I understand that I can loop through each record and save it, but that's not only messier visually but also more more inefficient because it executes N SQL statements instead of 1
# before
User.where(status: "active").update_all(status: "inactive")
# after
User.where(status: "active").each { |u| u.update(status: "inactive") }
Thanks!
Edit: I'm using Rails 4.2
Unfortunately update_all is way faster because it doesn't instantiate an active record object for each record and instead deals directly with the database. In your case, since you need validations and callbacks, you'll need to instantiate the objects and so you're best bet is iterating in batches of 1000 and performing the update as originally shown. Such as:
User.where(status: "active").find_each { |u| u.update(status: "inactive") }
The find_each method only loads 1000 objects at a time thus not overloading the garbage collector. If you have bulk records in the hundreds of thousands of rows I'd consider going back to update_all or moving the updating to a background task since it can easily cause a timeout when deployed.
Related
I have a situation like, I have a product table which has a JSON field, I have to call a method to update that. So there are two approaches
Product.all.each do |pr|
pr.update_column(:boolean_tree, pr.change_boolean_tree)
end
There is nothing wrong with this apart from performance . To improve the performance we can run raw SQL . Which will definitely works fine, but not in a rails way.
I was thinking is it possible to use update_all for this kind of situation. Any idea ?
As far as I know, you won't be able to use update_all unless you are trying to update all your records with the same content. When there are a lot of Products in the database the instruction Product.all would load all the records in memory before starting the update.
I would recommend using find_in_batches method from Activerecord (see doc) to load the products in batches.
I'm working on an existing Rails 2 site with a large codebase that recently updated to Ruby 1.9.2 and the mysql2 gem. I've noticed that this setup allows for non-blocking database queries; you can do client.query(sql, :async => true) and then later call client.async_result, which blocks until the query completes.
It seems to me that we could get a performance boost by having all ActiveRecord queries that return a collection decline to block until a method is called on the collection. e.g.
#widgets = Widget.find(:all, :conditions=> conditions) #sends the query
do_some_stuff_that_doesn't_require_widgets
#widgets.each do #if the query hasn't completed yet, wait until it does, then populate #widgets with the result. Iterate through #widgets
...
This could be done by monkey-patching Base::find and its related methods to create a new database client, send the query asynchronously, and then immediately return a Delegator or other proxy object that will, when any method is called on it, call client.async_result, instantiate the result using ActiveRecord, and delegate the method to that. ActiveRecord association proxy objects already work similarly to implement ORM.
I can't find anybody who's done this, though, and it doesn't seem to be an option in any version of Rails. I've tried implementing it myself and it works in console (as long as I append ; 1 to the line calling everything so that to_s doesn't get called on the result). But it seems to be colliding with all sorts of other magic and creating various problems.
So, is this a bad idea for some reason I haven't thought of? If not, why isn't it the way ActiveRecord already works? Is there a clean way to make it happen?
I suspect that that .async_result method isn't available for all database drivers; if not, it's not something that could be merged into generic ActiveRecord calls.
A more portable way to help performance when looping over a large recordset would be to use find_each or find_in_batches. I think they'll work in rails 2.3 as well as rails 3.x. http://guides.rubyonrails.org/active_record_querying.html#retrieving-multiple-objects-in-batches
I am working on a project that has the following cucumber step:
Given /^no registered users$/ do
User.delete_all
end
As a new RoR user this looks a little dangerous even though I'd be testing on our development database because our User table has actual data. What is the line of code doing?
Thanks!
delete_all is from activerecord library not from FactoryGirl.
And the difference between these two is :
delete_all(conditions = nil) public
Deletes the records matching conditions without instantiating the records first, and hence not calling the destroy method nor invoking callbacks.
This is a single SQL DELETE statement that goes straight to the database, much more efficient than destroy_all.
Be careful with relations though, in particular :dependent rules defined on associations are not honored.
Returns the number of rows affected.
destroy_all(conditions = nil) public
Destroys the records matching conditions by instantiating each record and calling its destroy method.
Each object’s callbacks are executed (including :dependent association options and before_destroy/after_destroy Observer methods).
Returns the collection of objects that were destroyed; each will be frozen, to reflect that no changes should be made (since they can’t be persisted).
Note
Instantiation, callback execution, and deletion of each record can be time consuming when you’re removing many records at once. It generates at least one SQL DELETE query per record . If you want to delete many rows quickly, without concern for their associations or callbacks, use delete_all instead.
delete_all is not from FactoryGirl, it is an active record command and it deletes the users from your database. If you are running this from cucumber then it should run against your test database, not development.
A better alternative is destroy_all since that version will run any associated callbacks. For example, if users have posts, and you have a before_destroy callback to remove posts if users are deleted.
Here's a link to more info about delete_all
delete_all will forceably remove records from the corresponding table without activating any rails callbacks.
destroy_all will remove the records but also call the model callbacks
Based on your example, it's probably deleting all users in order to allow the next Cucumber step to register new users. The ActiveRecord::Base#delete_all method says, in part:
Deletes the records matching conditions without instantiating the
records first, and hence not calling the destroy method nor invoking
callbacks. This is a single SQL DELETE statement that goes straight to
the database, much more efficient than destroy_all.
There are probably better ways to write that test, but the intent is clearly to remove the user records as efficiently as possible.
As for it being dangerous, your tests should be running against the test database, not the development or production databases. Since it's possible to misconfigure your testing framework to use the wrong database, you could certainly add a step or conditional that tests if Rails.env.test? is true. That's a fairly small price to pay for peace of mind.
I have 2 records of the same model, and I want to keep some of the data on these records in sync.
I was going to do a after_save callback (or maybe observer) to trigger updating the other record, but I am afraid this is going to cause an infinite loop of saves because the other record will cause a callback.
I read here that you can bypass callbacks on save, but these approaches seem to be hackish and not consistent between rails 2 and 3 (we are moving to rails 3 in a couple months).
Is there a better option?
You can create attr_accessor:
attr_accessor :dont_run_callback
after_save :my_callback
def my_callback
MyModel.find(1).update_attributes(..., :dont_run_callback => true) unless dont_run_callback
end
something like that
You can use the update_columns method while updating the 2nd record based on updates on the first one and vice versa.
In the Hibernate world, you can often have unit tests that appear to pass but there are actually bugs that don't show up because you're dealing with cached data. For example, you may save a parent with its children thinking it's cascading the save. If you re-query for the parent after the save and test the size of the child collection, it looks ok. But in reality Hibernate didn't save the children but cached the parent so you're looking at the un-saved children. One way to get around this is to clear the session cache between the save and the query so you know the data is coming straight from the database.
Is this an issue with ActiveRecord? If I save a model and then query it in the same test, is it possible that I'm not actually getting the data from the database but rather from the query cache? I haven't seen any sample tests that try to deal with this so I'm wondering if there's something that makes it a non-issue?
Yes. Depending on how you write your tests the Rails query cache can sometimes interfere. Sometimes rails is smart enough to keep track of when the cache needs to be cleared (When there is an obvious association between objects), but here is an example that would not behave as expected:
user.posts.should == []
Post.create(:user_id => user.id)
user.posts.size.should_not == [] # Fails, since the original query was cached.
In general, if you are executing the same query twice in the same test you should call .reload on the data prior to attempting to executing the second query. Like so:
user.posts.should == []
Post.create(:user_id => user.id)
user.posts.reload
user.posts.size.should_not == []
In my personal experience it is better to consider a different way of writing the test than to use the method above. For example, here's a better way of writing the above that will not be affected by the query cache:
lambda { Post.create(:user_id => user.id) }.should_change(user.posts, :count).by(1)
I have never run into this as an issue with ActiveRecord. My understanding is that the cache is only on reads, so saves are always performed.