Rails - running threads after method has exited - ruby-on-rails

When the client changes his profile picture it hits the update method, which responds with update.js.erb. This is a fast and straightforward process. However, behind the scenes on the server, a bunch of files (10 of them) is generated from the profile picture and these need to be uploaded to an Amazon bucket from the server. This a lengthy process and I don't want to make the client wait until it is finished. Moreover, the file uploads often fail with a RequestTimeoutException because they take longer than 15 seconds.
All this raises many questions:
How do you do the 10 file generation/upload after the update method has exited? Threads are killed after the main method has finished.
How do you catch an exception inside a thread? The following code does not catch the timeout exceptions.
threads = []
threads << Thread.new {
begin
# upload file 1 ....
rescue Rack::Timeout::RequestTimeoutException => e
# try to upload again ....
else
ensure
end
}
threads << Thread.new {
begin
# upload file 2 ....
rescue Rack::Timeout::RequestTimeoutException => e
# try to upload again ....
else
ensure
end
}
threads.each { |thr|
thr.join
}
What's the best way to try to upload a file again if it timed out?
What is the best solution to this problem?

You need to use delayed_job or whenever gem for background task, but I would like suggest sidekiq

I also faced the same problem in a project. I came accross a solution using AWS lambda. You can use carrierwave gem/ rails 5 active storage module if you are using rails to upload image on S3. If you are not using rails then use AWS-SDK for ruby to upload files on S3. You can bind events whenever a file created/modified on S3. Whenever a file created it will invoke lambda function and your work is done. can bind them to lambda function. In lambda function you can write logic to create files and upload it back to s3. You can write lambda code in ruby, node and python.
This strategy may help you.

Related

How do I mock S3Object.read do |chunk| using rspec

I'm new to ruby and rspec.
I have a module that interacts with S3 in ruby.
In my code I :
create a new S3 instance : s3 = AWS::S3.new()
Get my bucket : #s3bucket = s3.buckets[#bucket]
Retrieve my S3Object : object = #s3bucket.objects[key]
Finally, I save the object to a local file:
File.open(local_filename, 'wb') do |s3file|
object.read do |chunk|
return completed if stop?
s3file.write(chunk)
end
My code works well, but I'm having problems unit testing it,
specifically I'm having problems mocking the object.read do |chunk| part.
No matter what I try the chunk turns out empty.
Can some one help?
Thanks.
try this:
s3_object = class_double("S3Object")
allow(s3_object).to receive(:read).and_return(<what ever you want>)
If you need to need to store API responses in your tests without making multiple calls, check out https://github.com/vcr/vcr
If you want to mock this:
create a new S3 instance : s3 = AWS::S3.new()
you do
allow_any_instance_of(AWS::S3).to_receive(:new).and_return(<the return value of the method>)
You can use VCR as suggested previously but you will run into issues if you are working on a team and you run your tests at the same time as another team member if you both have deleted the same cassette.

Problems with file.path with csv import via sidekiq on heroku

I am using a background job in order to import user data from a csv file into my datase. First I did this "hard" in my User model by simply calling a method in my User model and by passing the file path which is transmitted via a form file_field:
User.import_csv(params[:file].path)
Worked well locally and on production (heroku).
Now when it comes to huge CSV files, I understood that I need a job to perform this import in the background. I am familiar with redis and sidekiq so the job was built quickly.
CsvImportJob.perform_async(URI.parse(params[:file].path))
and in my worker:
def perform(file_path)
User.import_csv(file_path)
end
Well, that also works perfect locally but as soon as I hit this on production, I see the following error in my log:
» 10 Aug 2015 13:56:26.596 2015-08-10 11:56:25.987726+00:00 app worker.1 - - 3 TID-oqvt6v1d4 ERROR: Actor crashed!
» 10 Aug 2015 13:56:26.596 2015-08-10 11:56:25.987728+00:00 app worker.1 - - Errno::ENOENT: No such file or directory # rb_sysopen - /tmp/RackMultipart20150810-6-14u804c.csv
» 10 Aug 2015 13:56:26.596 2015-08-10 11:56:25.987730+00:00 app worker.1 - - /app/vendor/ruby-2.2.2/lib/ruby/2.2.0/csv.rb:1256:in `initialize'
This is meant to be the file_path variable.
Somehow heroku is not able to find the file when I pass it to a sidekiq job. When I do this without sidekiq, it works.
I don't really know how to tackle this issue so any help is appreciated.
I had the same experience, you can look at a similar project of mine at https://github.com/coderaven/datatable-exercise/tree/parallel_processing
(Basically just focus on object_record.rb model and the jobs: import_csv_job.rb and process_csv_job.rb)
The error: Errno::ENOENT: No such file or directory # rb_sysopen
If you said that this works on heroku then probably that means that the path you are getting this is valid (in your example you are using the /tmp/ path)
So here's 2 probable problems and their solution:
1.) You have saved an unknown to Heroku path (or inaccessible path) which cannot be access or opened by the application when it is running. Since, when handling the import csv without sidekiq - the file you uploaded are save temporarily in-memory until you finish processing the csv - However, in a job scheduler (or sidekiq) the path should not be in memory and should be an existing path that is accessible to the app.
Solution: Save the file to a storage somewhere (heroku has an ephemeral filesystem so you cannot save files via the running web-app) to work this around, you have to use an Amazon S3 like service (you can also use Google Drive like what I did) to save your files there and then give the path to your sidekiq worker - so it can access and process it later.
2.) If the paths are correct and the files are save or processed correctly then from my experience it could have been that you are using File.open instead of the open-uri's open method. File.open does not accept remote files, you need to require open-uri on your worker and then use the open method to work around remote files.
ex.
require 'open-uri'
class ProcessCsvJob < ActiveJob::Base
queue_as :default
def perform(csv_path)
csv_file = open(csv_path,'rb:UTF-8')
SmarterCSV.process(csv_file) do |array|
.... code here for processing ...
end
end
end
I'm fully aware this question is already past almost a year, so if you have solved this or this answer worked then it could also help serve as a documentation archive for those who will probably experience the same problem.
You can't pass a file object to the perform method.
The fix is to massage the data beforehand and pass in the parameters you need directly.
Something like...
def import_csv(file)
CSV.foreach(file.path, headers: true) do |row|
new_user = { email: row[0], password: row[1] }
CsvImportJob.perform_async(new_user)
end
end
Note: you'd call CsvImportJob.perform_later for Sidekiq with ActiveJob and Rails 5.
You got the error because on production/staging and sidekiq run on different servers.
Use my solution: upload csv to google cloud storage
class Services::Downloader
require 'fog'
StorageCredentials = YAML.load_file("#{::Rails.root}/config/g.yml")[Rails.env]
def self.download(file_name, local_path)
storage = Fog::Storage.new(
provider: "Google",
google_storage_access_key_id: StorageCredentials['key_id'],
google_storage_secret_access_key: StorageCredentials['access_key'])
storage.get_bucket(StorageCredentials['bucket'])
f = File.open(local_path)
storage.put_object(StorageCredentials['bucket'], file_name, f)
storage.get_object_https_url(StorageCredentials['bucket'], file_name, Time.now.to_f + 24.hours)
end
end
Class User
class User < ApplicationRecord
require 'csv'
require 'open-uri'
def self.import_data(file)
load_file = open(file)
data = CSV.read(load_file, { encoding: "UTF-8", headers: true, header_converters: :symbol, converters: :all})
...
Worker
class ImportWorker
include Sidekiq::Worker
sidekiq_options queue: 'workers', retry: 0
def perform(filename)
User.import_data(filename)
end
end
and code for start worker
--
path = Services::Downloader.download(zip.name, zip.path)
ImportWorker.perform_async(path)

echoprint-codegen runs indefinitely with delayed_job

I'm attempting to run echoprint-codegen in a background process for analysing audio files as they're uploaded to a web service.
The desired functionality exists with a simple system call to the tmp file that gets uploaded via paperclip:
result = `echoprint-codegen #{path} 0 20` # works!
Unfortunately, this is not the case when the delayed workers fire off a new job; the echoprint-codegen process appears to hang indefinitely.
Per the echoprint README, I've double checked that ffmpeg is also within the path (Paperclip.options[:command_path] is pointing to the correct path).
I've also attempted to encapsulate the echoprint-codegen command line in a Paperclip.run() call, but that also results in a hanging process.
Any pointers?
I have obtained desired functionality by placing the echoprint-codegen system call in a Ruby Thread:
thread = Thread.new { Thread.current[:result] = `echoprint-codegen #{path} 0 20` }
thread.join
result = thread[:result]

how can we tell delayed_job when a delayed task fails so it will auto-retry?

Our app is hosted on heroku and we use delayed job when sending info to a remote system (via a GET to a url with some url params)
the remote system returns a success code usually, but it it;s real busy it returns a tryagain code.
suppose the our method is
def send_info
the_url = "http://mydomain.com/dosomething?arg=#{self.someval}"
the_result = open(the_url).read
successflag = get_success_flag_from(the_result)
end
and so somewhere in our code we do
#widget.delay.send_info
and that all works fine.
Except it does not automatically handle the case where the remote said to try back later.
Is there any way for the send_info method (which is what delayed job will execute) to "tell" delayed_job "retry me again"? Do we need to throw some custom exception or something?
Raising any kind of exception ought to cause delayed_job to requeue it (subject to only-trying-so-many-times); if you don't especially need a custom exception you can just raise a RuntimeError.

Saving an ActiveRecord non-transactionally

My application accepts file uploads, with some metadata being stored in the DB, and the file itself on the file system. I am trying to make the metadata visible in the application before the file upload and post-processing are finished, but because saves are transactional, I have had no success. I have tried the callbacks and calling create_or_update() instead of save(), all to no avail. Is there a way to do this without re-writing the guts of ActiveRecord::Base? I've even attempted naming the method make() instead of save(), but perplexingly that had no effect.
The code below "works" fine, but the database is not modified until everything else is finished.
def save(upload)
uploadFile = upload['datafile']
originalName = uploadFile.original_filename
self.fileType = File.extname(originalName)
create_or_update()
# write the file
File.open(self.filePath, "wb") { |f| f.write(uploadFile.read) }
begin
musicFile = TagLib::File.new(self.filePath())
self.id3Title = musicFile.title
self.id3Artist = musicFile.artist
self.id3Length = musicFile.length
rescue TagLib::BadFile => exc
logger.error("Failed to id track: \n #{exc}")
end
if(self.fileType == '.mp3')
convertToOGG();
end
create_or_update()
end
Any ideas would be quite welcome, thanks.
Have you considered processing the file upload as a background task? Save the metadata as normal and then perform the upload and post-processing using Delayed Job or similar. This Railscast has the details.
You're getting the meta-data from the file, right? So is the problem that the conversion to OGG is taking too long, and you want the data to appear before the conversion?
If so, John above has the right idea -- you're going to need to accept the file upload, and schedule a conversion to occur sometime in the future.
The main reason why is that your rails thread will process the OGG conversion and can't respond to any other web-requests until it's complete. Blast!
Some servers compensate for this by having multiple rails threads, but I recommend a background queue (use BJ if you host yourself, or Heroku's background jobs if you host there).

Resources