Inserting 200k records while parsing CSV locks up system

Inserting 200k records while parsing CSV locks up system - ruby-on-rails

I'm trying to insert 200k records into three different tables. When I parse the CSV file and try to insert these records, Ruby locks up my entire system. Here is my code:
def upload_country
#upload = Upload.new(:upload => params[:upload])
if #upload.save
csv = CSV.parse(csv_text, :headers => true)
csv.each_with_index do |row, index|
unless row["name"].blank? or row["country_code"].blank? or row["destination"].blank? or row["code"].blank?
#country = Country.new(:name => row["name"].gsub(/\s+/, " ").strip, :country_code => row["country_code"].gsub(/\s+/, " ").strip, :user_id => current_user.id, :subscriber_id => get_subscriber_id)
#country.save
if row["country_code"] == "1"
p = #country.country_code.to_s+#destination.name+row["code"].gsub(/\s+/, " ").strip
else
p = #country.country_code.to_s+row["code"].gsub(/\s+/, " ").strip
end
#code = DestinationCode.create(:code => p, :country_destination_id => 1, :user_id => current_user.id)
end
end
#upload.destroy
#countries = Country.find_all_by_subscriber_id(get_subscriber_id)
render :partial => '/mycarriers/carrier_country', :layout => false
end

If you have long-running requests, it means that no other users can access your application at that time if you have only one rails instance running.
I would recommend you use delayed_job gem to do long processing task in the background. On the controller side, you should enqueue the job and response 202 (Accepted) to the browser. On the client side, you should periodically send the request to the server whether the job is finished or not. Then, update the ui accordingly.
Take slideshare.net as an example. When the user finishes upload, slideshare redirect to the new page, and periodically update ui while it converts the presentation file.
An alternative solution is that you can run rake script in the background. Check out this episode from railscasts.

why dont use use Mass Insert here a gem that can help you in the quest I personally used it in my application to insert 500K records with resque for the background processing
Hope it help you

Related

Download a CSV file from FTP with Ruby on Rails and Update Existing Records

I'm trying to download a CSV file from an FTP server and if the record exists I want to update that record and not create a duplicate. To give a little more context - I'm trying to upload a group of orders from an FTP folder into my Rails app. There is a new file every hour - sometimes the orders in a certain drop contain duplicates from the previous drop to prevent one from slipping through the tracks or on occasion the order has been updated by the customer (change in qty, change in address, etc.) w/ the next drop. So my question is if the order is purely a duplicate with no changes how can I skip over those orders and if a record has been changed how can I update that record?
Ruby on Rails 5.1.4 - Ruby 2.4.1
Thank you!
The code below is from my model:
class Geek < ApplicationRecord
require 'csv'
def self.download_walmart_orders(out)
out ||= "#{Rails.root}/test_orders.csv"
CSV.foreach(out, :headers => true,
:converters => :all,
:header_converters => lambda { |h| h.downcase.gsub(' ', '_') }
) do |row|
geek = Geek.where(customer_order_id: row.to_h["customer_order_id"],
customer_name: row.to_h["customer_name"],
item_sku: row.to_h["item_sku"],
quantity_to_ship: row.to_h["quantity_to_ship"],
total_items_price: row.to_h["total_items_price"]).first_or_create
puts geek
end
end
end

I am assuming that customer_order_id is unique.
You could try something like this -
def self.update_or_create(attributes)
assign_or_new(attributes).save
end
Geek.where(customer_order_id: row.to_h["customer_order_id"]).update_or_create.(
customer_name: row.to_h["customer_name"],
item_sku: row.to_h["item_sku"],
quantity_to_ship: row.to_h["quantity_to_ship"],
total_items_price: row.to_h["total_items_price"])
^^^ Thank you, Michael, for the direction above. I ended up using this code and it worked perfectly. (For a slightly different project but exact same use case) my finalized model code is below:
class Wheel < ApplicationRecord
require 'csv'
def self.update_or_create(attributes)
obj = first || new
obj.assign_attributes(attributes)
obj.save!
end
def self.import(out)
out ||= "#{Rails.root}/public/300-RRW Daily Inv Report.csv"
CSV.foreach(out, :headers => true,
:converters => :all,
:header_converters => lambda { |h| h.downcase.gsub(' ', '_') }
) do |row|
Wheel.where(item: row.to_h["item"]).update_or_create(
item_desc: row.to_h["item_desc"],
total_quantity: row.to_h["total_quantity"])
end
end
end

Speeding up Ruby code to make faster/more API calls

I have the following code:
list_entities = [{:phone => '0000000000', :name => 'Test', :"#i:type => '1'},{:phone => '1111111111', :name => 'Demo', :"#i:type => '1'}]
list_entities.each do |list_entity|
phone_contact = PhoneContact.create(list_entity.except(:"#i:type"))
add_record_response = api.add_record_to_list(phone_contact, "API Test")
if add_record_response[:add_record_to_list_response][:return][:list_records_inserted] != '0'
phone_contact.update(:loaded_at => Time.now)
end
end
This code is taking an array of hashes and creating a new phone_contact for each one. It then makes an api call (add_record_response) to do something with that phone_contact. If that api call is successful, it updates the loaded_at attribute for that specific phone_contact. Then it starts the loop over.
I am allowed something like 7200 api calls per hour with this service - However, I'm only able to make about 1 api call every 4 seconds right now.
Any thoughts on how I could speed this code block up to make faster api calls?

I would suggest using a thread pool. You can define a unit of work to be done and the number of threads you want to process the work on. This way you can get around the bottleneck of waiting for the server to response on each request. Maybe try something like (disclaimer: this was adapted from http://burgestrand.se/code/ruby-thread-pool/)
require 'thread'
class Pool
def initialize(size)
#size = size
#jobs = Queue.new
#pool = Array.new(#size) do |i|
Thread.new do
Thread.current[:id] = i
catch(:exit) do
loop do
job, args = #jobs.pop
job.call(*args)
end
end
end
end
end
def schedule(*args, &block)
#jobs << [block, args]
end
def shutdown
#size.times do
schedule { throw :exit }
end
#pool.map(&:join)
end
end
p = Pool.new(4)
list_entries.do |list_entry|
p.schedule do
phone_contact = PhoneContact.create(list_entity.except(:"#i:type"))
add_record_response = api.add_record_to_list(phone_contact, "API Test")
if add_record_response[:add_record_to_list_response][:return][:list_records_inserted] != '0'
phone_contact.update(:loaded_at => Time.now)
end
puts "Job #{i} finished by thread #{Thread.current[:id]}"
end
at_exit { p.shutdown }
end

ruby on rails: Rails.cache.write returning false for array of specific object fields

I'm trying to save a large array of ids and permalinks for approved pages in memcache so that I don't have to hit the database multiple times for the same data.
My code:
if Rails.cache.exist?("ids_and_permalinks_array")
data = Rails.cache.fetch("ids_and_permalinks_array")
else
data = []
Page.approved.select('id, permalink').find_each { |f| data << f }
Rails.cache.write("ids_and_permalinks_array", data, :expires_in => 12.hours)
end
approved on Page is a simple where(:approved => true), and data is a large array of object subsets like #<Page id: 1, permalink: "page-permalink-1"> (the array is around 50,000 objects).
The Rails.cache.write line returns false when run from the console, with the development.log showing: Cache write: ids_and_permalinks_array ({:expires_in=>43200 seconds})
So, the log says it's writing to the cache, but Rails.cache.fetch("ids_and_permalinks_array") returns nil.
Suggestions? Anything obvious I'm doing wrong?
EDIT
I've also tried this, and still don't get the value written to cache:
Rails.cache.fetch("ids_and_permalinks_array", :expires_in => 12.hours, :race_condition_ttl => 10.minutes) do
Page.approved.select('id, permalink').find_each { |p| data << p }
end
** UPDATE 2 **
I added Rails.logger.info("\n#{Rails.cache.exist?("ids_and_permalinks_array")\n") to the beginning and end of the method call this code is in. Each time, at the beginning of the method call, it logs false and at the end it logs true... So, it's working, but is it only setting for that thread / instance of the method call?

When you use a Rails.cache.fetch, the given block needs to return the expected value. In:
Rails.cache.fetch("ids_and_permalinks_array", :expires_in => 12.hours, :race_condition_ttl => 10.minutes) do
Page.approved.select('id, permalink').find_each { |p| data << p }
end
the result of the block is nil, because that is the return value of find_each.
Try something like this instead:
Rails.cache.fetch("ids_and_permalinks_array", :expires_in => 12.hours, :race_condition_ttl => 10.minutes) do
[].tap do |data|
Page.approved.select('id, permalink').find_each { |p| data << p }
end
end
Using tap will ensure that the resulting array is passed as the return value to the fetch block.

Performance: minimize database hitting

I am using Ruby on Rails 3.0.7 and I am trying to minimize database hitting. In order to do that I retrieve from the database all Article objects related to a User and then perform a search on those retrieved objects.
What I do is:
stored_objects = Article.where(:user_id => <id>) # => ActiveRecord::Relation
<some_iterative_function_1>.each { |...|
stored_object = stored_objects.where(:status => 'published').limit(1)
...
# perform operation on the current 'stored_object' considered
}
<some_iterative_function_2>.each { |...|
stored_object = stored_objects.where(:visibility => 'public').limit(1)
...
# perform operation on the current 'stored_object' considered
}
<some_iterative_function_n>.each { |...|
...
}
The stored_object = stored_objects.where(:status => 'published') code will really avoid to hitting the database (I ask this because in my log file it seams still run a database query for each iteration)? If no, how can I minimize database hitting?
P.S.: in few words, what I would like to do is to work on the ActiveRecord::Relation (an array of ) but the where method called on it seams to hit the database.

Rails has functionality to grab chunks of the database at one time, then iterate over the rows without having to hit the database again.
See "Retrieving Multiple Objects in Batches" for more information about find_each and find_in_batches.

Once you start iterating over stored_objects (if that's what you're doing), they'll be loaded from the database. If you want to load only the users's published articles, you could do this:
stored_objects = Article.where(:user_id => id, :status => 'published')
If you instead want to load published and unpublished articles and do something different with the published ones, you could do this:
stored_objects = Article.where(:user_id => id)
stored_objects.find_all { |a| a.status == 'published' }. each do |a|
# ... do something with a published article
end
Or perhaps:
Article.where(:user_id => id).each do |article|
case article.status
when 'published'
# ... do something with a published article
else
# ... do something with an article that's not published
end
end
Each of these examples performs only one database query. Choosing which one depends on which data you really want to work with.

Problem in saving fields to database from csv using fastercsv

I am trying to save my csv data to table items which is associated with Item model.
This is what my csv have:
'name';'number';'sub_category_id';'category_id';'quantity';'sku'; 'description';'cost_price';'selling_price'
'Uploaded Item Number 1';'54';'KRT';'WN';'67';'WNKRT0054';'Some Description here!!';'780';'890'
'Uploaded Item Number 2';'74';'KRT';'WN';'98;'WNKRT0074';'Some Description here!!';'8660';'9790'
First row show the fields for items table.
Here I am using fastercsv to process my csv and paperclip to upload.
I am able to process file read content and able to fill up the field too here is the processing code:
def proc_csv
#import = Import.find(params[:id])
#lines = parse_csv_file(#import.csv.path)
#lines.shift
#lines.each do |line , j|
unless line.nil?
line_split = line.split(";")
unless ((line_split[0].nil?) or (line_split[1].nil?) or (line_split[2].nil?) or (line_split[3].nil?) or (line_split[4].nil?) or (line_split[5].nil?))
# I used puts to get to know about what's going on.
puts "*"*50+"line_split[0]: #{line_split[0]}"+"*"*50
puts "*"*50+"line_split[1]: #{line_split[1]}"+"*"*50
puts "*"*50+"line_split[2]: #{line_split[2]}"+"*"*50
puts "*"*50+"line_split[3]: #{line_split[3]}"+"*"*50
puts "*"*50+"line_split[4]: #{line_split[4]}"+"*"*50
puts "*"*50+"line_split[5]: #{line_split[5]}"+"*"*50
puts "*"*50+"line_split[6]: #{line_split[6]}"+"*"*50
puts "*"*50+"line_split[7]: #{line_split[7]}"+"*"*50
puts "*"*50+"line_split[8]: #{line_split[8]}"+"*"*50
#item = [:name => line_split[0], :number => line_split[1], :sub_category_id => line_split[2],:category_id => line_split[3],:quantity => line_split[4], :sku => line_split[5], :description => line_split[6], :cost_price => line_split[7], :selling_price => line_split[8]]
puts "#"*100+"#item is: #{#item.inspect}"+"#"*100
end
end
end
redirect_to import_path(#import)
end
but the problem is that when it process it and when I check the #item in console it looks like this:
#####################################################################################################item is: [{:quantity=>"\000'\0006\0007\000'\000", :name=>"\000'\000U\000p\000l\000o\000a\000d\000e\000d\000 \000I\000t\000e\000m\000 \000N\000u\000m\000b\000e\000r\000 \0001\000'\000", :sku=>"\000'\000W\000N\000K\000R\000T\0000\0000\0005\0004\000'\000", :cost_price=>"\000'\0007\0008\0000\000'\000", :number=>"\000'\0005\0004\000'\000", :selling_price=>"\000'\0008\0009\0000\000'\000", :sub_category_id=>"\000'\000K\000R\000T\000'\000", :description=>"\000'\000S\000o\000m\000e\000 \000D\000e\000s\000c\000r\000i\000p\000t\000i\000o\000n\000 \000h\000e\000r\000e\000!\000!\000'\000", :category_id=>"\000'\000W\000N\000'\000"}]####################################################################################################
#####################################################################################################item is: [{:quantity=>"\000'\0009\0008\000", :name=>"\000'\000U\000p\000l\000o\000a\000d\000e\000d\000 \000I\000t\000e\000m\000 \000N\000u\000m\000b\000e\000r\000 \0002\000'\000", :sku=>"\000'\000W\000N\000K\000R\000T\0000\0000\0007\0004\000'\000", :cost_price=>"\000'\0008\0006\0006\0000\000'\000", :number=>"\000'\0007\0004\000'\000", :selling_price=>"\000'\0009\0007\0009\0000\000'\000", :sub_category_id=>"\000'\000K\000R\000T\000'\000", :description=>"\000'\000S\000o\000m\000e\000 \000D\000e\000s\000c\000r\000i\000p\000t\000i\000o\000n\000 \000h\000e\000r\000e\000!\000!\000'\000", :category_id=>"\000'\000W\000N\000'\000"}]####################################################################################################
Can anyone kindly tell me why am I getting this kind of string instead of simple string I entered in my csv file? And because of this it's not being saved into the items table too, I have tried all possible formats but nothing seems to be working. I want :name => "Uploaded Item Number 1" instead of :name=>"\000'\000U\000p\000l\000o\000a\000d\000e\000d\000 \000I\000t\000e\000m\000 \000N\000u\000m\000b\000e\000r\000 \0001\000'\000" . Any help will be appreciated. Thanks in advance :)

After punching my head on to the wall and getting frustrated with this issue, I figured out that it was my CSV file that was not in proper format (was in UTF16le) and when I made an another csv file in UTF-8 encoding, I get the break through. Though I'm left with one more issue which is the string that comes is like this: :name => "'Uploaded Item number 1'" So when it saves into the database the column contains the data is: 'Uploaded Item number 1' . Do you have any idea how can I do it to like this: :name => "Uploaded Item number 1" ?

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Inserting 200k records while parsing CSV locks up system - ruby-on-rails

why dont use use Mass Insert here a gem that can help you in the quest I personally used it in my application to insert 500K records with resque for the background processing Hope it help you

Related

Download a CSV file from FTP with Ruby on Rails and Update Existing Records

Speeding up Ruby code to make faster/more API calls

ruby on rails: Rails.cache.write returning false for array of specific object fields

Performance: minimize database hitting

Problem in saving fields to database from csv using fastercsv

Categories

Resources