I have an API which uses a Service, in which I have used Ruby thread to reduce the response time of the API. I have tried to share the context using the following example. It was working fine with Rails 4, ruby 2.2.1
Now, we have upgraded rails to 5.2.3 and ruby 2.6.5. After which service has stopped working. I can call the service from Console, it works fine. But with API call, service becomes unresponsive once it reaches CurrencyConverter.new. Any Idea what can be the issue?
class ParallelTest
def initialize
puts "Initialized"
end
def perform
# Our sample set of currencies
currencies = ['ARS','AUD','CAD','CNY','DEM','EUR','GBP','HKD','ILS','INR','USD','XAG','XAU']
# Create an array to keep track of threads
threads = []
currencies.each do |currency|
# Keep track of the child processes as you spawn them
threads << Thread.new do
puts currency
CurrencyConverter.new(currency).print
end
end
# Join on the child processes to allow them to finish
threads.each do |thread|
thread.join
end
{ success: true }
end
end
class CurrencyConverter
def initialize(params)
#curr = params
end
def print
puts #curr
end
end
If I remove the CurrencyConverter.new(currency), then everything works fine. CurrencyConverter is a service object that I have.
Found the Issue
Thanks to #anothermh for this link
https://guides.rubyonrails.org/threading_and_code_execution.html#wrapping-application-code
https://guides.rubyonrails.org/threading_and_code_execution.html#load-interlock
As per the blog, When one thread is performing an autoload by evaluating the class definition from the appropriate file, it is important no other thread encounters a reference to the partially-defined constant.
Only one thread may load or unload at a time, and to do either, it must wait until no other threads are running application code. If a thread is waiting to perform a load, it doesn't prevent other threads from loading (in fact, they'll cooperate, and each perform their queued load in turn, before all resuming running together).
This can be resolved by permitting concurrent loads.
https://guides.rubyonrails.org/threading_and_code_execution.html#permit-concurrent-loads
Rails.application.executor.wrap do
urls.each do |currency|
threads << Thread.new do
CurrencyConverter.new(currency)
puts currency
end
ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
threads.map(&:join)
end
end
end
Thank you everybody for your time, I appreciate.
Don't re-invent the wheel and use Sidekiq instead. 😉
From the project's page:
Simple, efficient background processing for Ruby.
Sidekiq uses threads to handle many jobs at the same time in the same process. It does not require Rails but will integrate tightly with Rails to make background processing dead simple.
With 400+ contributors, and 10k+ starts on Github, they have build a solid parallel job execution process that is production ready, and easy to setup.
Have a look at their Getting Started to see it by yourself.
Related
I need to consume SQS events with my rails application. I've written a Sidekiq job which does a long polling like this:
class SqsConsumerWorker
include Sidekiq::Worker
def perform
...
poller = Aws::SQS::QueuePoller.new(<queue_url>, client: <sqs_instance>)
poller.poll(wait_time_seconds: 20, max_number_of_messages: 10, visibility_timeout: 180) do |messages|
messages.each do |message|
puts message.inspect
end
end
end
end
First problem was when to initiate this job. Currently I've moved the invocation to rails initializer where I've overriden the Sidekiq config.on(:startup) block to call this job. This will help me to start the job on every deployment. (I have also written some logic in this initializer to check the number of workers are not above some limit etc.)
I wanted to understand is there a better way to solve this problem? I've seen the gem shoryuken which abstracts out these things. But I need more control over the consumer and thought of having my own implementation. I also need to understand how to scale up and scale down the number of consumers with this approach.
I have a newsletter that I send out to my customers (~10k emails) every morning and sometimes happens that this Sidekiq job is taking some much CPU/memory performance that the website (Rails app) is not running and facing blackouts.
When I look at the Sidekiq dashboard, I see there is some problem (probably invalid email address and Sidekiq repeatedly trying to send it again?) with the newsletter and it's stuck.
How do I prevent this behavior and preclude repeating the Sidekiq task (which I believe that's the problem of the breakout)?
Here's my code:
rake task:
namespace :mailer do desc "Carrier blast - morning"
task :newsletter_morning => [:environment] do
NewslettertJob.perform_later
end
end
job definition:
class NewslettertJob < ApplicationJob
def perform
...
NewsletterMailer.morning_blast(data).deliver_now
end
end
and NewsletterMailer:
class NewsletterMailer < ApplicationMailer
def morning_blast(data)
...
customers.each do |customer|
yield customer, nil; next if customer.email.blank?
begin
Retryable.retryable( tries: 1, sleep: 30, on: [Net::OpenTimeout, Net::SMTPAuthenticationError, Net::SMTPServerBusy]) do
send_email(customer.email).deliver
end
send_email(customer.email).deliver
rescue Net::SMTPSyntaxError => e
error_msg = "Newsletter sending failed on #{Time.now} with: #{e.message}. e.inspect: #{e.inspect}"
logger.warn error_msg
yield customer, nil
next
end
end
end
end
What I want to achieve is that the newsletter will be sent out every morning and if Rails/Sidekiq faces a problem, it will simply shut itself down, so the newsletter will not affect the "life" on the main website (its server).
Thank you in advance for every advice. I am being stuck on this issue for a while now.
If your machine only has one core, Sidekiq and puma will fight for CPU. Lower Sidekiq's concurrency so it uses less CPU, or get a machine with multiple cores, or move Sidekiq to a different machine.
If a Sidekiq process is using 100% of a core, lower the concurrency setting. The default in Sidekiq 6.0 is 10, which is a good default but if you are just delivering emails you could probably bump that to 20. You can run multiple Sidekiq processes if you wish to utilize multiple cores to process jobs faster.
I think ideally, you should separate your background task servers from your web servers, that way background process won't impact on the performance of the web server. I work for a very high traffic/ high-load company, and we have an architecture of sorts in here.
There are explanations on how to stop retries in this answer: Disable automatic retry with ActiveJob, used with Sidekiq
Another thing, your e-mail sending is done synchronously (.deliver). This implicates on your task being a huge monolitical process with many customers, with huge impact on memory. Instead, you could use a deliver_later, so each customer get's it's own little worker. This will also help aliviate CPU and Memory usage. You could even create a worker for sending e-mails per customer, and use your monolitical Job to merely dispatch those.
class NewslettertJob < ApplicationJob
def perform
...
customers.each |customer| do
NewsletterMailer.morning_blast(customer, data).deliver_later if customer.email.present?
end
end
end
However, I think the silver bullet is separating your sidekiq server from your web server - having one server dedicated to background tasks. On your web server, you don't even start the sidekiq instances.
I have a ruby on rails web application deployed on Heroku.
This web app fetches some job feeds of given URLs as XMLs. Then regulates these XMLs and creates a single XML file. It worked pretty well for a while. However, since the #of URLs and job ads increases, it does not work at all. This process sometimes takes up to 45 secs since there are over 35K job vacancies (Heroku sends timeout after 30 secs). I am having an H12 timeout error. This error led me to read this worker dynos and background processing.
I figured out that I should apply the approach below:
Scalable-approach Heroku
Now I am using Redis and Sidekiq on my project. And I am able to create a background worker to do all the dirty work. But here is my question.
Instead of doing this call in the controller class:
def apply
send_data Aggregator.new(providers: providers).call,
type: 'text/xml; charset=UTF-8;',
disposition: 'attachment; filename=indeed_apply_yes.xml'
end
I am doin this perform_async call.
def apply
ReportWorker.perform_async(Time.now)
redirect_to health_path #and returns status 200 ok
end
I implemented this class: ReportWorker calls the Aggregator Service. data_xml is the field that I need to show somewhere or be downloaded automatically when it's ready.
class ReportWorker
include Sidekiq::Worker
sidekiq_options retry: false
data_xml = nil
def perform(start_date)
url_one = 'https://www.examplea.com/abc/download-xml'
url_two = 'https://www.exampleb.com/efg/download-xml'
cursor = 'stop'
providers = [url_one, url_two, cursor]
puts "SIDEKIQ WORKER GENERATING THE XML-DATA AT #{start_date}"
data_xml = Aggregator.new(providers: providers).call
puts "SIDEKIQ WORKER GENERATED THE XML-DATA AT #{Time.now}"
end
end
I know that It's not recommended to make send_data/file methods accessible out of Controller classes. Well, any suggestions on how to do it?
Thanks in advance!!
Do you can set up some database on your application? And then store record about completed jobs there, also you can save the entire file in database, but i recommend some cloud storage (like amazon s3).
And after that you can show current status of queued jobs on some page for user, with button 'download' after job has done
How can I detect if a particular request is still active?
For example I have this request uuid:
# my_controller.rb
def my_action
request.uuid # -> ABC1233
end
From another request, how can I know if the request with uuid ABC1233 is still working?
For the curious:
Following beanstalk directives I am running cron jobs using URL requests.
I don't want to start the next iteration if the previous one is still running. I can not just relay in a ini/end flag updated by the request because the request some times dies before it finishes.
Using normal cron tasks I was managing this properly using the PID of the process.
But I don't think I can use PID any more because processes in a web server can be reused among different requests.
I don't think Rails (or more correctly, Rack) has support for this since (to the best of my knowledge) each Rails request doesn't know about any other requests. You may try to get access to all running threads (and even processes) but such implementation (if even possible) seems ugly to me
.
How about implementing it yourself?
class ApplicationController < ActionController::Base
before_filter :register_request
after_filter :unregister_request
def register_request
$redis.set request.uuid
end
def unregister_request
$redis.unset request.uuid
end
end
You'll still need to figure out what to do with exceptions since after_filters are skipped (perhaps move this whole code to a middleware: on the before phase of the middleware it writes the uuid to redis and on the after phase it removes the key ). There's a bunch of other ways to achieve this I'm sure and obviously substitute redis with your favorite persistence of choice.
Finally I recovered my previous approach based on PIDs.
I implemented something like this:
# The Main Process
module MyProcess
def self.run_forked
Process.fork do
SynchProcess.run
end
Process.wait
end
def self.run
RedisClient.set Process.pid # store the PID
... my long process code is here
end
def self.still_alive?(pid)
!!Process.kill(0, pid) rescue false
end
end
# In one thread I can do
MyProcess.run_forked
# In another thread I can do
pid = RedisClient.get
MyProcess.still_alive?(pid) # -> true if the process still running
I can call this code from a Rails request and even if the request process is reused the child one is not and I can monitor the PID of the child process to see if the Ruby process is still running.
I have a rails app in which I would like to use the delayed_jobs gem to send texts/emails in background processes at certain times of the day. This is how I have the relevant parts of my app set up right now:
class SomeClass
after_create :send_reminder
def when_to_run
self.date_time - 1.hour
end
def send_reminder
MessageHandler.new().send_message
end
handle_asynchronously :send_reminder, run_at: Proc.new { |i| i.when_to_run }, queue: "Messages"
end
The MessageHandler class is a separate class I've defined which actually houses the methods for sending texts (with Twilio) and emails (with Mailgun).
After starting delayed_job (bin/delayed_job start) and creating an instance of SomeClass, the delayed_job log reads as follows:
Job SomeClass#send_reminder_without_delay (id=686) RUNNING
I'm not sure why it is running send_reminder_without_delay, and it's doing it every time. I've tried using MessageHandler.new().delay.send_message in the send_reminder method instead of MessageHandler.new().send_message, but that hasn't gotten me anywhere either.
I've searched high and low for answers and keep coming up short - any help would be much appreciated!
I think your problem is in run_at: Proc.new { |i| i.when_to_run }.
I'm not entirely sure what the date_time method does but have you checked whether it actually returns time that is more than 1 hour in the future?
Otherwise the run_At will receive a time in the past and run the method.
if the return value of date_time is similar to this:
def when_to_run
Time.now - 1.hour
end
Then the queue will run as soon as you start delayed job.
Check if it your run_at time is in the future.
We were using Rails 4.2 and set the queue adapter. We needed to use this line in our config:
config.active_job.queue_adapter = :delayed_job