How to import CSV into database on Rails whilst skipping callbacks - ruby-on-rails

I am currently trying to import over 40 CSV's exported from sqlite3 into oracle db but I seem to have issues whilst importing some of the CSV's, into the corresponding tables.
The code line with:
class_name.create!(row.to_hash)
produces errors on some classes because the callbacks are also triggered when the
.create!() method is called
def import_csv_into_db
Dir.foreach(Rails.root.join('db', 'csv_export')) do |filename|
next if filename == '.' or filename == '..' or filename == 'extract_db_into_csv.sh' or filename =='import_csv.rb'
filename_renamed = File.basename(filename, File.extname(filename)).chomp('s').titleize.gsub(/\s+/, "")
CSV.foreach(Rails.root.join('db', 'csv_export',filename), headers: true) do |row|
class_name = Object.const_get(filename_renamed)
puts class_name
class_name.create!(row.to_hash)
puts "Insert on table #{class_name}s complete with: #{row.to_hash}"
end
end
end
The issue at hand is that my CSV import function is in the seeds.rb, so whenver I run bundle exec rake db:seed the CSV's are imported.
How exactly can I avoid the callbacks being triggered when class_name.create!(row.to_hash) is triggered within the function in the seeds.rb ?
In my customer.rb I have callbacks such as:
after_create :add_default_user or after_create :add_build_config
I'd like to manipulate my function within the seeds.rb to skip the callbacks when the function tries importing a CSV file like customers.csv (which would logically call Customer.create!(row.to_hash)).

There are lower level methods which will not run callbacks. For example, instead of create! you can call insert!. Instead of destroy you can call delete.
Side note: use insert_all! to bulk insert multiple rows at once. Your import will be much faster and it does not use validations. Though I would recommend the more flexible active-import instead.
However, skipping callbacks might cause problems if they are necessary for the integrity of the data. If you delete instead of destroy associated data may not be deleted, or you may get errors because of referential integrity. Be sure to add on delete actions on your foreign keys to avoid this. Then the database itself will take care of it.
Consider whether your db:seeds is doing too much. If importing this CSV is a hindrance to seeding the database, consider if it should be a separate rake task instead.
Consider whether your callbacks can be rewritten to be idempotent, that is to be able to run multiple times. For example, after_create :add_default_user should recognize there already is a default user and not try to re-add it.
Finally, consider whether callbacks which are run every time a model is created are the correct place to do this work.

Related

Disabling callbacks in data migration with ActiveRecord - Ruby/Rails 5

I am importing CSV files in my database through a function which looks like this:
def import_csv_into_db
CSV.foreach(Rails.root.join('db', 'csv_export',filename), headers: true) do |row|
class_name = filename_renamed.constantize
next if row.to_hash['deleted_at'] != nil
Customer.before_create.reject! {|callback| callback.method.to_s == 'unique' }
Customer.create!(row.to_hash)
end
end
The current file which I am testing is the customers.csv.
I would like to skip methods or better said, the callbacks which are
triggered when I create objects to insert data into the corresponding db, e.g. Customer.create!(row.to_hash)
This is why I tried to implement the following method:
Customer.before_create.reject! {|callback| callback.method.to_s == 'unique' }
so that I can skip callbacks from the customer.rb model
But I get the error:
ArgumentError: wrong number of arguments calling `method` (0 for 1)
I understand the error, but what I don't understand is how to correctly implement the Customer.before_create line mentioned above.
My method is implemented in the seeds.rb and is triggered upon running rake db:seed and yes it works fine withother CSV's, just the ones with callbacks are causing trouble.
I found the .before_create.reject method here
The error comes from the fact that callback (in your block) is an object, and its method method returns a method given the name. That is: 'a string'.method(:upcase) and SomeClass.method(:new).
You can use insert! instead of create!, which skips everything, but then you have to add all the needed columns (including timestamps):
attributes = row.to_hash
attributes['created_at'] = attributes['updated_at'] = DateTime.now.utc
class_name.insert!(attributes)

How to avoid duplicates from saving in database parsed from external JSON file with sidekiq in Rails

I have a small To Do list in a .json file that I´m reading, parsing, and saving to a rails app with Sidekiq. Everytime I refresh the browser, the worker executes and duplicates the entries on the database. How do I maintain a unique database that is synchronized with the .json file and avoid duplicate entries to save on my database and show on my browser?
Here's the worker:
class TodoWorker
include Sidekiq::Worker
def perform
json_text = File.read('todo_json.json')
json = JSON.parse(json_text, :headers => true)
json.each do |todo|
t = TodoList.create(name: todo["name"], done: todo["done"])
t.save
end
end
end
And the controller:
class TodoListsController < ApplicationController
def index
#todo_lists = TodoList.all
TodoWorker.perform_async
end
end
Thanks
This is a terrible solution btw, you have a huge race condition in your read/store code, and you're not going to be able to use a large part of what Rails is good at. If you want a simple DB why not just use sqlite?
That being said, you need some way of recognizing duplicates, in most DBs this is done with a primary key that is sent to the browser along with the rest of the data, and then back again with any changes. That primary key is used to ensure that existing data is updated, rather than duplicated.
You will need the same thing in your JSON file, and then you can change your create method to be something more like ActiveRecord's find_or_create_by

How do I ensure correctness when using find_in_batches?

current my application have stat needs and I
make up a background job using rufus-scheduler and runs at 3:00
to batch process these records into CacheStat table. It's just like
any normal application's Weekly/Monthly Stat needs.
And I found out using find_each(say using User.find_each to iterate
all users), which invokes find_in_batches, I checkout the source code
of rails,
while records.any?
records_size = records.size
primary_key_offset = records.last.id
yield records
break if records_size < batch_size
if primary_key_offset
records = relation.where(table[primary_key].gt(primary_key_offset)).to_a
else
raise "Primary key not included in the custom select clause"
end
end
which the implentation is by comparing the primary-key,
my concern is the cocurrency,while I processing the batch,
whatif some records be inserted in-between?
does anybody have this kind of problem?
While I think, this code implementation may be be problemic,
because new records will always have larger PK and later in the
end will be find.
So this is what this kind of needs be implemented? If I want to
implement a batch stat processing by myself(without rails), then I
need to ensure have an integer primary key and using these fields to
compare(better not to use other kind of fields)?
(I was thinking of this because I'm kind of in the middle of switching
from mysql to mongo, so maybe later I need to implement this kind of
functionality by myself).
If I understand correctly, you can ensure correctness here by enforcing transactional isolation, e.g.
User.transaction do
User.find_each do |user|
user
end
end

Using and Editing Class Variables in Ruby?

So I've done a couple of days worth of research on the matter, and the general consensus is that there isn't one. So I was hoping for an answer more specific to my situation...
I'm using Rails to import a file into a database. Everything is working regarding the import, but I'm wanting to give the database itself an attribute, not just every entry. I'm creating a hash of the file, and I figured it'd be easiest to just assign it to the database (or the class).
I've created a class called Issue (and thus an 'issues' database) with each entry having a couple of attributes. I was wanting to figure out a way to add a class variable (at least, that's what I think is the best option) to Issue to simply store the hash. I've written a rake to import the file, iff the new file is different than the previous file imported (read, if the hash's are different).
desc "Parses a CSV file into the 'issues' database"
task :issues, [:file] => :environment do |t, args|
md5 = Digest::MD5.hexdigest(args[:file])
puts "1: Issue.md5 = #{Issue.md5}"
if md5 != Issue.md5
Issue.destroy_all()
#import new csv file
CSV.foreach(args[:file]) do |row|
issue = {
#various attributes to be columns...
}
Issue.create(issue)
end #end foreach loop
Issue.md5 = md5
puts "2: Issue.md5 = #{Issue.md5}"
end #end if statement
end #end task
And my model is as follows:
class Issue < ActiveRecord::Base
attr_accessible :md5
##md5 = 5
def self.md5
##md5
end
def self.md5=(newmd5)
##md5 = newmd5
end
attr_accessible #various database-entry attributes
end
I've tried various different ways to write my model, but it all comes down to this. Whatever I set the ##md5 in my model, becomes a permanent change, almost like a constant. If I change this value here, and refresh my database, the change is noted immediately. If I go into rails console and do:
Issue.md5 # => 5
Issue.md5 = 123 # => 123
Issue.md5 # => 123
But this change isn't committed to anything. As soon as I exit the console, things return to "5" again. It's almost like I need a .save method for my class.
Also, in the rake file, you see I have two print statements, printing out Issue.md5 before and after the parse. The first prints out "5" and the second prints out the new, correct hash. So Ruby is recognizing the fact that I'm changing this variable, it's just never saved anywhere.
Ruby 1.9.3, Rails 3.2.6, SQLite3 3.6.20.
tl;dr I need a way to create a class variable, and be able to access it, modify it, and re-store it.
Fixes please? Thanks!
There are a couple solutions here. Essentially, you need to persist that one variable: Postgres provides a key/value store in the database, which would be most ideal, but you're using SQLite so that isn't an option for you. Instead, you'll probably need to use either redis or memcached to persist this information into your database.
Either one allows you to persist values into a schema-less datastore and query them again later. Redis has the advantage of being saved to disk, so if the server craps out on you you can get the value of md5 again when it restarts. Data saved into memcached is never persisted, so if the memcached instance goes away, when it comes back md5 will be 5 once again.
Both redis and memcached enjoy a lot of support in the Ruby community. It will complicate your stack slightly installing one, but I think it's the best solution available to you. That said, if you just can't use either one, you could also write the value of md5 to a temporary file on your server and access it again later. The issue there is that the value then won't be shared among all your server processes.

How to write Rake task to import data to Rails app?

Goal: Using a CRON task (or other scheduled event) to update database with nightly export of data from an existing system.
All data is created/updated/deleted in an existing system. The website does no directly integrate with this system, so the rails app simply needs to reflect the updates that appear in the data export.
I have a .txt file of ~5,000 products that looks like this:
"1234":"product name":"attr 1":"attr 2":"ABC Manufacturing":"2222"
"A134":"another product":"attr 1":"attr 2":"Foobar World":"2447"
...
All values are strings enclosed in double quotes (") that are separated by colons (:)
Fields are:
id: unique id; alphanumeric
name: product name; any character
attribute columns: strings; any character (e.g., size, weight, color, dimension)
vendor_name: string; any character
vendor_id: unique vendor id; numeric
Vendor information is not normalized in the current system.
What are best practices here? Is it okay to delete the products and vendors tables and rewrite with the new data on every cycle? Or is it better to only add new rows and update existing ones?
Notes:
This data will be used to generate Orders that will persist through nightly database imports. OrderItems will need to be connected to the product ids that are specified in the data file, so we can't rely on an auto-incrementing primary key to be the same for each import; the unique alphanumeric id will need to be used to join products to order_items.
Ideally, I'd like the importer to normalize the Vendor data
I cannot use vanilla SQL statements, so I imagine I'll need to write a rake task in order to use Product.create(...) and Vendor.create(...) style syntax.
This will be implemented on EngineYard
I wouldn't delete the products and vendors tables on every cycle. Is this a rails app? If so there are some really nice ActiveRecord helpers that would come in handy for you.
If you have a Product active record model, you can do:
p = Product.find_or_initialize_by_identifier(<id you get from file>)
p.name = <name from file>
p.size = <size from file>
etc...
p.save!
The find_or_initialize will lookup the product in the database by the id you specify, and if it can't find it, it will create a new one. The really handy thing about doing it this way, is that ActiveRecord will only save to the database if any of the data has changed, and it will automatically update any timestamp fields you have in the table (updated_at) accordingly. One more thing, since you would be looking up records by the identifier (id from the file), I would make sure to add an index on that field in the database.
To make a rake task to accomplish this, I would add a rake file to the lib/tasks directory of your rails app. We'll call it data.rake.
Inside data.rake, it would look something like this:
namespace :data do
desc "import data from files to database"
task :import => :environment do
file = File.open(<file to import>)
file.each do |line|
attrs = line.split(":")
p = Product.find_or_initialize_by_identifier(attrs[0])
p.name = attrs[1]
etc...
p.save!
end
end
end
Than to call the rake task, use "rake data:import" from the command line.
Since Products don't really change that often, the best way I would see is to update only the records that change.
Get all the deltas
Mass update using a single SQL statement
If you are having your normalization code in the models, you could use Product.create and Vendor.create or else it would be just a overkill. Also, Look into inserting multiple records in a single SQL transaction, its much faster.
Create an importer rake task that is cronned
Parse the file line by line using Faster CSV or via vanilla ruby like:
file.each do |line|
products_array = line.split(":")
end
Split each line on the ":" and push in into a hash
Use a find_or_initialize to populate your db such as:
Product.find_or_initialize_by_name_and_vendor_id("foo", 111)

Resources