Saving multiple objects in a single call in rails - ruby-on-rails

I have a method in rails that is doing something like this:
a = Foo.new("bar")
a.save
b = Foo.new("baz")
b.save
...
x = Foo.new("123", :parent_id => a.id)
x.save
...
z = Foo.new("zxy", :parent_id => b.id)
z.save
The problem is this takes longer and longer the more entities I add. I suspect this is because it has to hit the database for every record. Since they are nested, I know I can't save the children before the parents are saved, but I would like to save all of the parents at once, and then all of the children. It would be nice to do something like:
a = Foo.new("bar")
b = Foo.new("baz")
...
saveall(a,b,...)
x = Foo.new("123", :parent_id => a.id)
...
z = Foo.new("zxy", :parent_id => b.id)
saveall(x,...,z)
That would do it all in only two database hits. Is there an easy way to do this in rails, or am I stuck doing it one at a time?

Since you need to perform multiple inserts, database will be hit multiple times. The delay in your case is because each save is done in different DB transactions. You can reduce the latency by enclosing all your operations in one transaction.
class Foo
belongs_to :parent, :class_name => "Foo"
has_many :children, :class_name => "Foo", :foreign_key=> "parent_id"
end
Your save method might look like this:
# build the parent and the children
a = Foo.new(:name => "bar")
a.children.build(:name => "123")
b = Foo.new("baz")
b.children.build(:name => "zxy")
#save parents and their children in one transaction
Foo.transaction do
a.save!
b.save!
end
The save call on the parent object saves the child objects.

You might try using Foo.create instead of Foo.new. Create "Creates an object (or multiple objects) and saves it to the database, if validations pass. The resulting object is returned whether the object was saved successfully to the database or not."
You can create multiple objects like this:
# Create an Array of new objects
parents = Foo.create([{ :first_name => 'Jamie' }, { :first_name => 'Jeremy' }])
Then, for each parent, you can also use create to add to its association:
parents.each do |parent|
parent.children.create (:child_name => 'abc')
end
I recommend reading both the ActiveRecord documentation and the Rails Guides on ActiveRecord query interface and ActiveRecord associations. The latter contains a guide of all the methods a class gains when you declare an association.

insert_all (Rails 6+)
Rails 6 introduced a new method insert_all, which inserts multiple records into the database in a single SQL INSERT statement.
Also, this method does not instantiate any models and does not call Active Record callbacks or validations.
So,
Foo.insert_all([
{ first_name: 'Jamie' },
{ first_name: 'Jeremy' }
])
it is significantly more efficient than
Foo.create([
{ first_name: 'Jamie' },
{ first_name: 'Jeremy' }
])
if all you want to do is to insert new records.

One of the two answers found somewhere else: by Beerlington.
Those two are your best bet for performance
I think your best bet performance-wise is going to be to use SQL, and bulk insert multiple rows per query. If you can build an INSERT statement that does something like:
INSERT INTO foos_bars (foo_id,bar_id) VALUES (1,1),(1,2),(1,3)....
You should be able to insert thousands of rows in a single query. I didn't try your mass_habtm method, but it seems like you could to something like:
bars = Bar.find_all_by_some_attribute(:a)
foo = Foo.create
values = bars.map {|bar| "(#{foo.id},#{bar.id})"}.join(",")
connection.execute("INSERT INTO foos_bars (foo_id, bar_id) VALUES
#{values}")
Also, if you are searching Bar by "some_attribute", make sure you have that field indexed in your database.
OR
You still might have a look at activerecord-import. It's right that it doesn't work without a model, but you could create a Model just for the import.
FooBar.import [:foo_id, :bar_id], [[1,2], [1,3]]
Cheers

you need to use this gem "FastInserter" -> https://github.com/joinhandshake/fast_inserter
and inserting a large number and thousands of records is fast because this gem skips active record, and only uses a single sql raw query

You don't need a gem to hit DB fast and only once!
Jackrg has worked it out for us:
https://gist.github.com/jackrg/76ade1724bd816292e4e

Related

Rails 4 - Is there a performance win between using self. vs Model call in after_create callbacks

When creating a model Deal, I use an after_create to create 10 prizes on the prize table.
Is there a performance difference or any performance-related (like garbage colleciton maybe) that would help me decide between a and B
A
after_create :create_prizes
def create_prizes
300000.times do
prizes = self.prizes.create(:deal_id => self.id, :admin_user_id => self.admin_user_id)
end
end
B
after_create :create_prizes
def create_prizes
300000.times do
prizes = Prize.create(:deal_id => self.id, :admin_user_id => self.admin_user_id)
end
end
B
Note that when the Admin creates a deal, it will create a very large number of prizes (up to 300,000).
Thanks for any help,
Mathieu
Option B should be slightly faster as AR does't need to traverse the relations to find the foreign key. However, inserting 300,000 records will be slow either way.
Consider generating a SQL INSERT statement or passing an array to create.
Prize.create([{deal_id: 1}, {deal_id: 2}])
https://www.coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/

How do I keep has_many :through relationships when serializing to JSON and back in Rails 4.0.3?

How do I convert to JSON and back and keep the relationships? It thinks they don't exist when I un-parcel the object!
irb(main):106:0* p = Post.last
=> #<Post ...
irb(main):107:0> p.tags
=> #<ActiveRecord::Associations::CollectionProxy [#<Tag id: 41, ...
irb(main):109:0* p.tags.count
=> 2 #### !!!!!!!!!!!!
irb(main):110:0> json = p.to_json
=> "{\"id\":113,\"title\":... }"
irb(main):111:0> p2 = Post.new( JSON.parse(json) )
=> #<Post id: 113, title: ...
irb(main):112:0> p2.tags
=> #<ActiveRecord::Associations::CollectionProxy []>
irb(main):113:0> p2.tags.count
=> 0 #### !!!!!!!!!!!!
Here is the model
class Post < ActiveRecord::Base
has_many :taggings, :dependent => :destroy
has_many :tags, :through => :taggings
What someone suggested, but doesn't work
irb(main):206:0* Post.new.from_json p.to_json(include: :tags)
ActiveRecord::AssociationTypeMismatch: Tag(#60747984) expected, got Hash(#15487524)
I simulated the exact same scenario like yours and found out:
Whenever a model(Post) has a has_many through association then upon creating an instance of that Model i.e., Post passing a Hash for eg: Post.new( JSON.parse(json) ) or Post.new(id: 113) seems like Rails treats them differently although they are pointing to the same record.
I ran the following commands in the sequence as given below:
p = Post.last
p.tags
p.tags.count
json = p.to_json
p2 = Post.new( JSON.parse(json) )
p2.tags
p2.tags.count ## Gives incorrect count
p3 = Post.find(JSON.parse(json)["id"]) ### See notes below
p3.tags
p3.tags.count ## Gives the correct count
Instead of creating a new instance of Post using Hash directly, I fetched the record from database using the id obtained from deserializing json. In this case, the instance p3 and instance p2 refer to the same Post but Rails is interpreting them differently.
Disclaimer: This is not, in any way, an ideal solution (and I would call it down-right cheesy), but its about the only thing I've been able to come up with for your scenario.
What Kirti Thorat said is correct; when you have a dependent object, Rails expects the association in the hash to be of that specific class (in your case, a Tag object). Hence the error you're getting: Tag expected...got Hash.
Here comes the cheesy part: One way to properly deserialize a complex object is to leverage the accepts_nested_attributes_for method. By using this method, you'll allow your Post class to properly deserialize the dependent Tag key-value pairs to proper Tag objects. Start with this:
class Post < ActiveRecord::Base
accepts_nested_attributes_for :tags
# rest of class
end
Since accepts_nested_attributes_for searches for a key with the word _attributes for the given association, you'll have to alter the JSON when it is rendered to accommodate this by overriding the as_json method in your Post class, like so:
def as_json(options={})
json_hash = super.as_json(options)
unless json_hash["tags"].nil?
json_hash["tags_attributes"] = json_hash["tags"] # Renaming the key
json_hash.delete("tags") # remove the now unnecessary "tags" key
end
json_hash # don't forget to return this at the end
end
Side note: There are lots of json building gems such as acts_as_api that will allow you to remove this as_json overriding business
So now your rendered JSON has all the Post attributes, plus an array of tag attribute key-value pairs under the key tags_attributes.
Technically speaking, if you were to deserialize this rendered JSON in the manner suggested by Kirti, it would work and you would get back a properly populated active record object. However, unfortunately, the presence of the id attributes in both the parent Post object, and the dependent tag objects means that active record will fire off at least one SQL query. It will do a quick lookup for the tags to determine if anything needs to be added or deleted, as per the specifications of the has_many relationship (specifically, the collection=objects part).
Since you said you'd like to avoid hitting the database, the only solution I've been able to find is to render to JSON in the same way leesungchul suggested, but specifically excluding the id fields:
p_json = p.to_json(except: [:id], include: {tags: {except: :id}})
If you then do:
p2 = Post.new(JSON.parse(p_json))
You should get back a fully rendered Post object without any DB calls.
This, of course, assumes you don't need those id fields. In the event you do...frankly I'm not certain of a better solution other than to rename the id fields in the as_json method.
Also note: With this method, because of the lack of id fields, you won't be able to use p2.tags.count; it will return zero. You'll have to use .length instead.
You can try
p2.as_json( :include => :tags )
When you call
p2.tags
you get correct tags but p2 is not saved in the database yet. This seems the reason for
p2.tags.count
giving a 0 all the time.
If you actually do something like:
p2.id = Post.maximum(:id) + 1
p2.tags #Edit: This needs to be done to fetch the tags mapped to p from the database.
p2.save
p2.tags.count
You get the correct count

Cleaning up controllers to speed up application

So in my app I have notifications and different record counts that are used in the overall layout, and are therefore needed on every page.
Currently in my application_controller I have a lot of things like such:
#status_al = Status.find_by_name("Alive")
#status_de = Status.find_by_name("Dead")
#status_sus = Status.find_by_name("Suspended")
#status_hid = Status.find_by_name("Hidden")
#status_arc = Status.find_by_name("Archived")
#balloon_active = Post.where(:user_id => current_user.id, :status_id => #status_al.id )
#balloon_dependent = Post.where(:user_id => current_user.id, :status_id => #status_de.id )
#balloon_upcoming = Post.where(:user_id => current_user.id, :status_id => #status_sus.id )
#balloon_deferred = Post.where(:user_id => current_user.id, :status_id => #status_hid.id )
#balloon_complete = Post.where(:user_id => current_user.id, :status_id => #status_arc.id )
..
Thats really just a small piece, I have at least double this with similar calls. The issue is I need these numbers pretty much on every page, but I feel like I'm htting the DB wayyyy too many times here.
Any ideas for a better implementation?
Scopes
First off, you should move many of these into scopes, which will allow you to use them in far more flexible ways, such as chaining queries using ActiveRecord. See http://edgerails.info/articles/what-s-new-in-edge-rails/2010/02/23/the-skinny-on-scopes-formerly-named-scope/index.html.
Indexes
Second, if you're doing all these queries anyway, make sure you index your database to, for example, find Status quickly by name. A sample migration to accomplish the first index:
add_index :status (or the name of your Status controller), :name
Session
If the data you need here is not critical, i.e. you don't need to rely on it to further calculations or database updates, you could consider storing some of this data in the user's session. If you do so, you can simply read whatever you need from the session in the future instead of hitting your db on every page load.
If this data is critical and/or it must be updated to the second, then avoid this option.
Counter Caching
If you need certain record counts on a regular basis, consider setting up a counter_cache. Basically, in your models, you do the following:
Parent.rb
has_many :children
Child.rb
belongs_to :parent, :counter_cache => true
Ensure your parent table has a field called child_count and Rails will update this field for you on every child's creation/deletion. If you use counter_caching, you will avoid hitting the database to get the counts.
Note: Using counter_caching will result in a slightly longer create and destroy action, but if you are using these counts often, it's usually worth going with counter_cache.
You should only need 1 database query for this, something like:
#posts = Post.where(:user_id => current_user.id).includes(:status)
Then use Enumerable#group_by to collect the posts into the different categories:
posts_by_status = #posts.group_by do {|post| post.status.name }
which will give you a hash:
{'Alive' => [...], 'Dead' => [...]}
etc.

How to coerce type of ActiveRecord attribute returned by :select phrase on joined table?

Having trouble with AR 2.3.5, e.g.:
users = User.all( :select => "u.id, c.user_id", :from => "users u, connections c",
:conditions => ... )
Returns, e.g.:
=> [#<User id: 1000>]
>> users.first.attributes
=> {"id"=>1000, "user_id"=>"1000"}
Note that AR returns the id of the model searched as numeric but the selected user_id of the joined model as a String, although both are int(11) in the database schema.
How could I better form this type of query to select columns of tables backing multiple models and retrieving their natural type rather than String ? Seems like AR is punting on this somewhere. How could I coerce the returned types at AR load time and not have to tack .to_i (etc.) onto every post-hoc access?
It's unfortunately not going to happen very easily. All of the data from the DB connection comes to rails as strings, the conversion of types happens in each of the dynamic attribute methods that rails creates at runtime. It knows which attributes to convert to which type by the table's column-type meta-data that it retrieves when the app starts. Each model only has column meta-data for it's own columns, that's why it's own columns end up with correct type. There is no easy way to auto-convert to the correct types.
You could on the other hand, create a simple conversion method that would take a Hash and automatically convert the attributes.
Something like this:
users = User.all(:select => "cl, comments.c2", ...)
users = convert_columns(users, 'c2' => :integer, 'other_column' => :date)
def convert_columns(records, columns = {})
records.each do |rec|
columns.each do |col, type|
rec[col] = case type
when :int then rec[col].to_i
when :date then ........
....
end
end
end
end
Why are you using :from => "users" inside a User.method ?
The following will do an inner join (which is what you are doing anyways)
users = User.all(:include => :connections, :select => "users.id, connections.user_id", :conditions => {...})
This is going to be very heavy query for the database.
Faster query would be with the outer join though.
This will also return the keys as INT not STRING
A much faster alternative was
Connection.all(:include => :user, :conditions => {...}).collect {|e| [e.user_id, e.id] }
This gives you an array of arrays with the ids. If you are going to select "id, user_id" columns only, then it may not necessarily be as AR object. An array can be faster.
I hope I am not missing some point here. Suggest me, if I am.
If you want quick solution - try to use after_find callback and preset correct attributes types there:
class User < ActiveRecord::Base
after_find :preset_types
private
def preset_types user
user.user_id = user.user_id.to_i
end
end

Validates_uniqueness_of does not work when doing a large Transaction

I have a validate_uniqueness_of :field inside my ActiveRecord model. When i do a single create/update it works nicely but i have to do some large batch creation from csv files inside a Transaction
When i am in the transaction the validate_uniqueness_of does not detect the error and the model is saved!
Could it be that the non-unique values are created during the transaction?
The validate methods check before the transaction and then all values are still not present in the table and thus unique.
Edit: Create a index with the unique property turned on for your field and the transaction will fail and thus preventing the addition of non-unique elements.
To do some you should add something this in your migration file
add_index("tablename", "fieldname", { :name => "fieldname_index", :unique => true })
Edit 2: A transaction like this will will give something like a "ActiveRecord::StatementInvalid: Mysql::Error: Duplicate entry '123' for key 1: <sql statement here>" error.
Table.transaction do
i1 = Table.new
i1.fieldname = "123"
i1.save
i2 = Table.new
i2.fieldname = "123"
i2.save
end
validates_uniqueness_of is subject to race conditions, and you still need to have the appropriate unique constraints on your database. You are describing this situation. The link provides a few solutions.

Resources