Batch insertion in rails 3 - ruby-on-rails

I want to do a batch insert of few thousand records into the database (POSTGRES in my case) from within my Rails App.
What would be the "Rails way" of doing it?
Something which is fast and also correct way of doing it.
I know I can create the SQL query by string concatenation of the attributes but I want a better approach.

ActiveRecord .create method supports bulk creation. The method emulates the feature if the DB doesn't support it and uses the underlying DB engine if the feature is supported.
Just pass an array of options.
# Create an Array of new objects
User.create([{ :first_name => 'Jamie' }, { :first_name => 'Jeremy' }])
Block is supported and it's the common way for shared attributes.
# Creating an Array of new objects using a block, where the block is executed for each object:
User.create([{ :first_name => 'Jamie' }, { :first_name => 'Jeremy' }]) do |u|
u.is_admin = false
end

I finally reached a solution after the two answers of #Simone Carletti and #Sumit Munot.
Until the postgres driver supports the ActiveRecord .create method's bulk insertion, I would like to go with activerecord-import gem. It does bulk insert and that too in a single insert statement.
books = []
10.times do |i|
books << Book.new(:name => "book #{i}")
end
Book.import books
In POSTGRES it lead to a single insert statemnt.
Once the postgres driver supports the ActiveRecord .create method's bulk insertion in a single insert statement, then #Simone Carletti 's solution makes more sense :)

You can create a script in your rails model, write your queries to insert in that script
In rails you can run the script using
rails runner MyModelName.my_method_name
Is the best way that i used in my project.
Update:
I use following in my project but it is not proper for sql injection.
if you are not using user input in this query it may work for you
user_string = " ('a#ao.in','a'), ('b#ao.in','b')"
User.connection.insert("INSERT INTO users (email, name) VALUES"+user_string)
For Multiple records:
new_records = [
{:column => 'value', :column2 => 'value'},
{:column => 'value', :column2 => 'value'}
]
MyModel.create(new_records)

You can do it the fast way or the Rails way ;) The best way in my experience to import bulk data to Postgres is via CSV. What will take several minutes the Rails way will take several seconds using Postgres' native CSV import capability.
http://www.postgresql.org/docs/9.2/static/sql-copy.html
It even triggers database triggers and respects database constraints.
Edit (after your comment):
Gotcha. In that case you have correctly described your two options. I have been in the same situation before, implemented it using the Rails 1000 save! strategy because it was the simplest thing that worked, and then optimized it to the 'append a huge query string' strategy because it was an order of magnitude better performing.
Of course, premature optimization is the root of all evil, so perhaps do it the simple slow Rails way, and know that building a big query string is a perfectly legit technique for optimization at the expense of maintainabilty. I feel your real question is 'is there a Railsy way that doesn't involve 1000's of queries?' - unfortunately the answer to that is no.

Related

Ruby on Rails #find_each and #each. is this bad practice?

I know that find_each has been designed to consume smaller memory than each.
I found some code that other people wrote long ago. and I think that it's wrong.
Think about this codes.
users = User.where(:active => false) # What this line does actually? Nothing?
users.find_each do |user|
# update or do something..
user.update(:do_something => "yes")
end
in this case, It will store all user objects to the users variable. so we already populated the full amount of memory space. There is no point using find_each later on.
Am I correct?
so in other words, If you want to use find_each, you always need to use it with ActiveRecord::Relation object. Like this.
User.where(:active => false).find_each do |user|
# do something...
end
What do you think guys?
Update
in users = User.where(:active => false) line,
Some developer insists that rails never execute query unless we don't do anything with that variable.
What if we have a class with initialize method that has query?
class Test
def initialize
#users = User.where(:active => true)
end
def do_something
#user.find_each do |user|
# do something really..
end
end
end
If we call Test.new, what would happen? Nothing will happen?
users = User.where(:active => false) doesn't run a query against the database and it doesn't return an array with all inactive users. Instead, where returns an ActiveRecord::Relation. Such a relation basically describes a database query that hasn't run yet. The defined query is only run against the database when the actual records are needed. This happens for example when you run one of the following methods on that relation: find, to_a, count, each, and many others.
That means the change you did isn't a huge improvement, because it doesn't change went and how the database is queried.
But IMHO that your code is still slightly better because when you do not plan to reuse the relation then why assign it to a variable in the first place.
users = User.where(:active => false)
users.find_each do |user|
User.where(:active => false).find_each do |user|
Those do the same thing.
The only difference is the first one stores the ActiveRecord::Relation object in users before calling #find_each on it.
This isn't a Rails thing, it applies to all of Ruby. It's method chaining common to most object-oriented languages.
array = Call.some_method
array.each{ |item| do_something(item) }
Call.some_method.each{ |item| do_something(item) }
Again, same thing. The only difference is in the first the intermediate array will persist, whereas in the second the array will be built and then eventually deallocated.
If we call Test.new, what would happen? Nothing will happen?
Exactly. Rails will make an ActiveRecord::Relation and it will defer actually contacting the database until you actually do a query.
This lets you chain queries together.
#inactive_users = User.where(active: false).order(name: :asc)
Later you can to the query
# Inactive users whose favorite color is green ordered by name.
#inactive_users.where(favorite_color: :green).find_each do |user|
...
end
No query is made until find_each is called.
In general, pass around relations rather than arrays of records. Relations are more flexible and if it's never used there's no cost.
find_each is special in that it works in batches to avoid consuming too much memory on large tables.
A common mistake is to write this:
User.where(:active => false).each do |user|
Or worse:
User.all.each do |user|
Calling each on an ActiveRecord::Relation will pull all the results into memory before iterating. This is bad for large tables.
find_each will load the results in batches of 1000 to avoid using too much memory. It hides this batching from you.
There are other methods which work in batches, see ActiveRecord::Batches.
For more see the Rails Style Guide and use rubocop-rails to scan your code for issues and make suggestions and corrections.

Ruby: Hash: use one record attribute as key and another as value

Let's say I have a User with attributes name and badge_number
For a JavaScript autocomplete field I want the user to be able to start typing the user's name and get a select list.
I'm using Materialize which offers the JS needed, I just need to provide it the data in this format:
data: { "Sarah Person": 13241, "Billiam Gregory": 54665, "Stephan Stevenston": 98332 }
This won't do:
User.select(:name, :badge_number) => { name: "Sarah Person", badge_number: 13241, ... }
And this feels repetitive, icky and redundant (and repetitive):
user_list = User.select(:name, :badge_number)
hsh = {}
user_list.each do |user|
hsh[user.name] = user.badge_number
end
hsh
...though it does give me my intended result, performance will suck over time.
Any better ways than this weird, slimy loop?
This will give the desired output
User.pluck(:name, :badge_number).to_h
Edit
Though above code is one liner, it still have loop internally. Offloading such loops to database may improve the performance when dealing with too many rows. But there is no database agnostic way to achieve this in active record. Follow this answer for achieving this in Postgres
If your RDBMS is Postgresql, you can use Postgresql function json_build_object for this specific case.
User.select("json_build_object(name, badge_number) as json_col")
.map(&:json_col)
The whole json can be build using Postgresql supplied functions too.
User.select("array_to_json(array_agg(json_build_object(name, badge_number))) as json_col")
.limit(1)[0]
.json_col

Is there something like an `ILIKE` method in Rails 4?

I used to do this with an array condition inside the where method:
Article.where('title ILIKE ?','%today%')
This worked in Postgres but ILIKE is not present in MySQL and other DBMS.
What I need is to be able to perform case insensitive queries using a code like
Article.ilike(title:'%today%',author:'%john%')
Even if there's not builtin method to perform case insensitive queries, you can use the Arel library and the matches method, like in:
Article.where(Article.arel_table[:title].matches('%today%'))
This is DB agnostic and SQL Injection proof.
I've written an ilike method in my common scope file, that allows you to call it with a list of attributes and values, that's it:
module CommonScopes
extend ActiveSupport::Concern
module ClassMethods
def ilike( options={} )
raise ArgumentError unless options.is_a? Hash or options.empty?
if options.one?
where(arel_table[options.keys.first].matches(options.values.first))
else
key, value = options.shift
ilike( {key=>value} ).merge( ilike( options ) )
end
end
end
end
You can place this inside app/models/concerns/common_scopes.rb and include where you need it.
No, there isn't. You need to write driver-specific SQL to achieve this.
ActiveRecord's goal is to make database access fast and easy for 90% of usecases, not to make your models completely database-agnostic. Switching your entire database backend from one system to another is not something they optimize for.
You might consider looking at another gem like DataMapper which provides a Ruby-syntax for wrapping things like like (but which may or may not provide an equivalent to ilike):
# If the value of a pair is an Array, we do an IN-clause for you.
Person.all(:name.like => 'S%', :id => [ 1, 2, 3, 4, 5 ])
Rails don't have the direct case sensitive search. It's dependent on the DB level. For MySQL you can use LOWER method.
YourModel.where('lower(column_name) = ?', str.downcase)

Speeding up XML to MySQL with Nokogiri in Rails

I'm writing large amounts of data from XML feeds to my MySQL database in my Rails 3 app using Nokogiri. Everything is working fine but it's slower than I would like.
Is there any way to speed up the process? This is simplified version of the script I'm using:
url = "http://example.com/urltoxml"
doc = Nokogiri::XML(open(url))
doc.xpath("//item").each do |record|
guid = record.xpath("id").inner_text
price = record.xpath("price").inner_text
shipping = record.xpath("shipping").inner_text
data = Table.new(
:guid => guid,
:price => price,
:shipping => shipping
)
if price != ""
data.save
end
end
Thnx in advance
I guess your problem is not from parsing XML, but is that you insert the records one by one in the DB, which is very costly.
Unfortunately, AFAIK Rails does not provide a native way to mass-insert records. There once was a gem that did it, but I can't get my hand back on it.
"Mass inserting data in Rails without killing your performance", though, provides helpful insights on how to do it manually.
If you go this way, don't forget to process your nodes in batches if you don't want to end with a single 999-bazillion-rows INSERT statement.

Do "like" queries with ActiveRecord in Rails 2.x and 3.x?

I'm doing queries like this in Rails 3.x
Speaker.where("name like '%yson%'")
but I'd love to avoid the DB specific code. What's the right way to do this?
If there's a way to do this in Rails 2.x too, that would help too.
In Rails 3 or greater
Speaker.where("name LIKE ?", "%yson%")
In Rails 2
Speaker.all(:conditions => ["name LIKE ?", "%yson%"])
Avoid to directly interpolate strings because the value won't be escaped and you are vulnerable to SQL injection attacks.
You can use .matches for it.
> t[:name].matches('%lore').to_sql
=> "\"products\".\"name\" LIKE '%lore'"
Actual usage in a query would be:
Speaker.where(Speaker.arel_table[:name].matches('%lore'))
Use a search engine like solr or sphinx to create indexes for the columns you would be performing like queries on. Like queries always result in a full table scan when you look at the explain plan so you really should almost never use them in a production site.
Not by default in Rails, since there are so many DB options (MySQL, Postgresql, MongoDB, CouchDB...), but you can check out gems like MetaWhere, where you can do things like:
Article.where(:title.matches => 'Hello%', :created_at.gt => 3.days.ago)
=> SELECT "articles".* FROM "articles" WHERE ("articles"."title" LIKE 'Hello%')
AND ("articles"."created_at" > '2010-04-12 18:39:32.592087')
In general though you'll probably have to have some DB specific code, or refactor your code (i.e redefine the .matches operator on symbols in MetaWhere) to work with a different database. Hopefully you won't be changing your database that often, but if you are you should have a centralized location where you define these operators for re-use. Keep in mind that an operator or function defined in one database might not be available in another, in which case having this generalized operation is moot since you won't be able to perform the search anyways.

Resources