Huge rake task not completing on heroku - ruby-on-rails

So i've got a rake task on my rails application which populated my db from a CSV
However this is populating around 40k rows,
How do i go about making this run all the way without stopping?
Heres the rake task (edited)
viacsv.each do |vrow|
viadate = vrow[10].split(' ')[0]
viatime = vrow[10].split(' ')[1]
vianame = vrow[8]
viaurl = vrow[9]
csv.each do |row|
Event.find_or_create_by(time: row[9], date: row[10], eventurl1: row[8], eventname: row[7])
link = viacsv.find_by(['eventname LIKE ?', "%#{vianame}%"])
byebug
end
end
Thanks
Sam

Related

Rails cached value lost/nil despite expires_in 24.hours

I am using ruby 2.3.3 and Rails 4.2.8 with Puma (1 worker, 5 threads) and on my admin (i.e. not critical) page I want to show some stats (integer values) from my database. Some requests take quite a long time to perform so I decided to cache these values and use a rake task to re-write them every day.
Admin#index controller
require 'timeout'
begin
timeout(8) do
#products_a = Rails.cache.fetch('products_a', :expires_in => 24.hours) { Product.where(heavy_condition_a).size }
#products_b = Rails.cache.fetch('products_b', :expires_in => 24.hours) { Product.where(heavy_condition_b).size }
#products_c = Rails.cache.fetch('products_c', :expires_in => 24.hours) { Product.where(heavy_condition_c).size }
#products_d = Rails.cache.fetch('products_d', :expires_in => 24.hours) { Product.where(heavy_condition_d).size }
end
rescue Timeout::Error
#products_a = 999
#products_b = 999
#products_c = 999
#products_d = 999
end
Admin#index view
<li>Products A: <%= #products_a %></li>
<li>Products B: <%= #products_b %></li>
<li>Products C: <%= #products_c %></li>
<li>Products D: <%= #products_d %></li>
Rake task
task :set_admin_index_stats => :environment do
Rails.cache.write('products_a', Product.where(heavy_condition_a).size, :expires_in => 24.hours)
Rails.cache.write('products_b', Product.where(heavy_condition_b).size, :expires_in => 24.hours)
Rails.cache.write('products_c', Product.where(heavy_condition_c).size, :expires_in => 24.hours)
Rails.cache.write('products_d', Product.where(heavy_condition_d).size, :expires_in => 24.hours)
end
I am using this in production and use Memcachier (on Heroku) as a cache store. I also use it for page caching on the website and it works fine there. I have:
production.rb
config.cache_store = :dalli_store
The problem I am experiencing is that the cached values disappear almost instantly, and quite intermittently, from the cache. In the console I have tried:
I Rails.cache.write one value (e.g. product_a) and check it a minute later, it is still there. Although crude, I can see the "Set cmds" increments by one in Memcachier admin tool.
However, when I add the next value (e.g. product_b) the first one disappears (becomes nil)! Sometimes if I add all 4 values, 2 seems to stick. These are not always the same values. It is like whack-a-mole!
If I run the rake to write the values and then try to read the values typically only two values are left, whereas the others are lost.
I have seen a similar question related to this where the reason explained was the use of a multithread server. The cached value was saved in one thread and could not be reached in another, the solution was to use a memcache, which I do.
It is not only the console. If I just reload admin#index view to store the values or run the rake task, I experience the same problem. The values do not stick.
My suspicion is that I am either not using the Rails.cache-commands properly or that these commands do not in fact use Memcachier. I have not been able to determine whether my values are in fact stored in Memcachier but when I use my first command in the console I do get the following:
Rails.cache.read('products_a')
Dalli::Server#connect mc1.dev.eu.ec2.memcachier.com:11211
Dalli/SASL authenticating as abc123
Dalli/SASL authenticating as abc123
Dalli/SASL: abc123
Dalli/SASL: abc123
=> 0
but I do not get that for subsequent writes (which I assume is a matter of readability in the console and not a proof of Memcachier not being used.
What am I missing here? Why won't the values stick in my cache?
Heroku DevCenter states a little different cache config and gives some advice about threaded Rails app servers like Puma using connection_pool gem:
config.cache_store = :mem_cache_store,
(ENV["MEMCACHIER_SERVERS"] || "").split(","),
{ :username => ENV["MEMCACHIER_USERNAME"],
:password => ENV["MEMCACHIER_PASSWORD"],
:failover => true,
:socket_timeout => 1.5,
:socket_failure_delay => 0.2,
:down_retry_delay => 60,
:pool_size => 5
}

SQL execution time in Rake tasks

I've various rake tasks inside my rails app. One simple example is shown below.
desc "Simple rake task"
task :test_rake do |task|
first_sql_query = FirstModel.find(10)
SecondModel.create(:name => 'Test 101', :email => 'abc#def.co')
final_query = SecondModel.find(900)
end
Now in the above rake task, we're making three database calls with each of them taking x, y, z seconds supposedly.
Is there any way to find out the total time spent on db operations(x+y+z secs) for a given rake task..??
Use benchmark
task :test_rake do |task|
time = Benchmark.realtime {
first_sql_query = FirstModel.find(10)
SecondModel.create(:name => 'Test 101', :email => 'abc#def.co')
final_query = SecondModel.find(900)
}
puts time
end
getting separate benchmarks:
puts Benchmark.measure { FirstModel.find(10) }
puts Benchmark.measure { SecondModel.create(:name => 'Test 101', :email => 'abc#def.co') }
puts Benchmark.measure { final_query = SecondModel.find(900) }
Query timing is included in the log file:
❯❯❯ rake db:migrate:status
database: ml_development
Status Migration ID Migration Name
--------------------------------------------------
up 20180612055823 ********** NO FILE **********
then:
❯❯❯ cat log/development.log
[DEBUG] (0.4ms) SELECT "schema_migrations"."version" FROM "schema_migrations" ORDER BY "schema_migrations"."version" ASC
```
#pattu Rails internally use Benchmark for execution time check say_with_time
You can add the same function in a module and include in the rack task to find the execution time of SQL queries.
These Two functions are needed:
def say(message, subitem = false)
puts "#{subitem ? " ->" : "--"} #{message}"
end
def say_with_time(message = "")
say(message)
result = nil
time = Benchmark.measure { result = yield }
say "%.4fs" % time.real, :subitem
say("#{result} rows", :subitem) if result.is_a?(Integer)
result
end
Use this in Your rake task as
desc "Simple rake task"
task :test_rake do |task|
say_with_time do
first_sql_query = FirstModel.find(10)
SecondModel.create(:name => 'Test 101', :email => 'abc#def.co')
final_query = SecondModel.find(900)
end
end

Rails console vs a rake task: returning File.size is not consistent

I'm having a strange issue where when I check the File.size of a particular file in Rails console, it returns the correct size. However when I run the same code in a rake task, it returns 0. Here is the code in question (I've tidied it up a bit to help with readability):
def sum_close
daily_closed_tickets = Fst.sum_retrieve_closed_tickets
daily_closed_tickets.each do |ticket|
CSV.open("FILE_NAME_HERE", "w+", {force_quotes: false}) do |csv|
if (FileCopyReceipt.exists?(path: "#{ticket.attributes['TroubleTicketNumber']}_sum.txt"))
csv << ["GENERATE CSV WITH ATTRIBUTES HERE"]
files = Dir.glob("/var/www/html/harmonize/public/close/CLOSED_#{ticket.attributes['TroubleTicketNumber']}_sum.txt")
files.each do |f|
Rails.logger.info "File size (should return non-0): #{File.size(f)}" #returns 0, but not in Rails Console
Rails.logger.info "File size true or false, should be true: #{File.size(f) != 0}" #returns false, should return true
Rails.logger.info "Rails Environment: #{Rails.env}" #returns production
if(!FileCopyReceipt.exists?(path: f) && (File.size(f) != 0))
Rails.logger.info("SUM CLOSE, GOOD => FileUtils.cp_r occurred and FileCopyReceipt object created")
else
Rails.logger.info("SUM CLOSE, WARNING: => no data transfer occurred")
end
end
else
Rails.logger.info("SUM CLOSE => DID NOT make it into initial if ClosedDate.present? if block")
end
end
end
close_tickets.rake
task :close_tickets => :environment do
tickets = FstController.new
tickets.sum_close
tickets.dais_close
end
It is beyond me why this File.size comes back as 0 when this is run as a rake task. I thought it may be a environment issue, but that does not seem to be the case.
Any insight on the matter is appreciated.
The CSV.open block and everything being wrapped in there was causing issues. So I just made CSV generation it's own snippet instead of wrapping everything in there.
daily_closed_tickets.each do |ticket|
CSV.open("generate csv here.txt") do |csv|
#enter ticket.attributes here for the csv
end
#continue on with the rest of the code and File.size() works properly
end

SQLite3::BusyException: database is locked: INSERT INTO

When I run this piece of code with a task, it works
task :importGss => :environment do
Gss.delete_all
file = Rails.root + "app/assets/CSVs/gss.csv"
csv_text = File.read(file)
puts csv_text.size
csv = CSV.parse(csv_text, :col_sep => ';', :headers => true)
csv.each do |row|
Gss.create!(row.to_hash)
end
When I run it with a MVC, I have the following message :
ActiveRecord::StatementInvalid (SQLite3::BusyException: database is locked:
I have put the above code in a function in the Gss model.
The import is launched from the browser with a GET that is routed to the controller that calls the model import function
When the import is finished, the complete list of record should then be returned to the view.
the csv file has 4k rows.
The process of importing takes time and it seems that after precisely 60 seconds the GET is resend.
Can someone explain me how to avoid this resending that crashes the import ?
Wrapping it in a transaction will make sure all the queries will be run together, rather than 1 at a time. This will drastically decrease the time taken to execute the import for that many rows.
task :importGss => :environment do
Gss.delete_all
file = Rails.root + "app/assets/CSVs/gss.csv"
csv_text = File.read(file)
puts csv_text.size
csv = CSV.parse(csv_text, :col_sep => ';', :headers => true)
ActiveRecord::Base.transaction do
csv.each do |row|
Gss.create!(row.to_hash)
end
end
end
Read more on transactions here: http://api.rubyonrails.org/classes/ActiveRecord/Transactions/ClassMethods.html

How to initialize a method in a rake file?

Apologies for the probably noobie question:
I have a rake task that is designed to take data from a site and save it as a RateData object.
rs.each do |market,url|
doc = Nokogiri::HTML(open(url))
doc.xpath("//table/tr").each do |item|
provider = "rs"
market = market
rate = item.xpath('td[1]').text.gsub!(/[^0-9\.]/, '')
volume = item.xpath('td[2]').text.gsub(/[^k0-9\.]/, '')
volume = volume.gsub(/\.(?=.k)/, '')
volume = volume.gsub(/k/, '00')
volume = volume.to_f
rate = rate.to_f
RateData.create(:provider => provider, :market => market, :rate => rate, :volume => volume, :bid_ask => 1)
end
end
The RateData.create method is in the rate_data_controller and is accessible when I call it in the rails console. How can I make it available in this rake task?
Many thanks!
you need to pass the environment into the task
task :your_task, [] => :environment do
or with args
task :your_task, [:foo] => :environment do |task, args|

Resources