I have a Rails app that sends multiple requests both sequentially and in parallel to a third-party API and do calculation in the backend.
I would like to know how long each of my API requests and calculation takes. Is there performance testing gem I should use?
Note: my app uses Sidekiq to process backend jobs.
http://guides.rubyonrails.org/performance_testing.html might get you started, check out section 3 for details of wrapping methods in "benchmark" which outputs some useful stats to the log.
As a quick example:
def process
Benchmark.bm do |x|
x.report("Processing Task") do
process_task(task_options)
end
end
end
would output something like:
user system total real
Processing Task 8.206000 1.092000 9.298000 ( 14.609000)
Related
I need to consume SQS events with my rails application. I've written a Sidekiq job which does a long polling like this:
class SqsConsumerWorker
include Sidekiq::Worker
def perform
...
poller = Aws::SQS::QueuePoller.new(<queue_url>, client: <sqs_instance>)
poller.poll(wait_time_seconds: 20, max_number_of_messages: 10, visibility_timeout: 180) do |messages|
messages.each do |message|
puts message.inspect
end
end
end
end
First problem was when to initiate this job. Currently I've moved the invocation to rails initializer where I've overriden the Sidekiq config.on(:startup) block to call this job. This will help me to start the job on every deployment. (I have also written some logic in this initializer to check the number of workers are not above some limit etc.)
I wanted to understand is there a better way to solve this problem? I've seen the gem shoryuken which abstracts out these things. But I need more control over the consumer and thought of having my own implementation. I also need to understand how to scale up and scale down the number of consumers with this approach.
I have a newsletter that I send out to my customers (~10k emails) every morning and sometimes happens that this Sidekiq job is taking some much CPU/memory performance that the website (Rails app) is not running and facing blackouts.
When I look at the Sidekiq dashboard, I see there is some problem (probably invalid email address and Sidekiq repeatedly trying to send it again?) with the newsletter and it's stuck.
How do I prevent this behavior and preclude repeating the Sidekiq task (which I believe that's the problem of the breakout)?
Here's my code:
rake task:
namespace :mailer do desc "Carrier blast - morning"
task :newsletter_morning => [:environment] do
NewslettertJob.perform_later
end
end
job definition:
class NewslettertJob < ApplicationJob
def perform
...
NewsletterMailer.morning_blast(data).deliver_now
end
end
and NewsletterMailer:
class NewsletterMailer < ApplicationMailer
def morning_blast(data)
...
customers.each do |customer|
yield customer, nil; next if customer.email.blank?
begin
Retryable.retryable( tries: 1, sleep: 30, on: [Net::OpenTimeout, Net::SMTPAuthenticationError, Net::SMTPServerBusy]) do
send_email(customer.email).deliver
end
send_email(customer.email).deliver
rescue Net::SMTPSyntaxError => e
error_msg = "Newsletter sending failed on #{Time.now} with: #{e.message}. e.inspect: #{e.inspect}"
logger.warn error_msg
yield customer, nil
next
end
end
end
end
What I want to achieve is that the newsletter will be sent out every morning and if Rails/Sidekiq faces a problem, it will simply shut itself down, so the newsletter will not affect the "life" on the main website (its server).
Thank you in advance for every advice. I am being stuck on this issue for a while now.
If your machine only has one core, Sidekiq and puma will fight for CPU. Lower Sidekiq's concurrency so it uses less CPU, or get a machine with multiple cores, or move Sidekiq to a different machine.
If a Sidekiq process is using 100% of a core, lower the concurrency setting. The default in Sidekiq 6.0 is 10, which is a good default but if you are just delivering emails you could probably bump that to 20. You can run multiple Sidekiq processes if you wish to utilize multiple cores to process jobs faster.
I think ideally, you should separate your background task servers from your web servers, that way background process won't impact on the performance of the web server. I work for a very high traffic/ high-load company, and we have an architecture of sorts in here.
There are explanations on how to stop retries in this answer: Disable automatic retry with ActiveJob, used with Sidekiq
Another thing, your e-mail sending is done synchronously (.deliver). This implicates on your task being a huge monolitical process with many customers, with huge impact on memory. Instead, you could use a deliver_later, so each customer get's it's own little worker. This will also help aliviate CPU and Memory usage. You could even create a worker for sending e-mails per customer, and use your monolitical Job to merely dispatch those.
class NewslettertJob < ApplicationJob
def perform
...
customers.each |customer| do
NewsletterMailer.morning_blast(customer, data).deliver_later if customer.email.present?
end
end
end
However, I think the silver bullet is separating your sidekiq server from your web server - having one server dedicated to background tasks. On your web server, you don't even start the sidekiq instances.
I have an API which uses a Service, in which I have used Ruby thread to reduce the response time of the API. I have tried to share the context using the following example. It was working fine with Rails 4, ruby 2.2.1
Now, we have upgraded rails to 5.2.3 and ruby 2.6.5. After which service has stopped working. I can call the service from Console, it works fine. But with API call, service becomes unresponsive once it reaches CurrencyConverter.new. Any Idea what can be the issue?
class ParallelTest
def initialize
puts "Initialized"
end
def perform
# Our sample set of currencies
currencies = ['ARS','AUD','CAD','CNY','DEM','EUR','GBP','HKD','ILS','INR','USD','XAG','XAU']
# Create an array to keep track of threads
threads = []
currencies.each do |currency|
# Keep track of the child processes as you spawn them
threads << Thread.new do
puts currency
CurrencyConverter.new(currency).print
end
end
# Join on the child processes to allow them to finish
threads.each do |thread|
thread.join
end
{ success: true }
end
end
class CurrencyConverter
def initialize(params)
#curr = params
end
def print
puts #curr
end
end
If I remove the CurrencyConverter.new(currency), then everything works fine. CurrencyConverter is a service object that I have.
Found the Issue
Thanks to #anothermh for this link
https://guides.rubyonrails.org/threading_and_code_execution.html#wrapping-application-code
https://guides.rubyonrails.org/threading_and_code_execution.html#load-interlock
As per the blog, When one thread is performing an autoload by evaluating the class definition from the appropriate file, it is important no other thread encounters a reference to the partially-defined constant.
Only one thread may load or unload at a time, and to do either, it must wait until no other threads are running application code. If a thread is waiting to perform a load, it doesn't prevent other threads from loading (in fact, they'll cooperate, and each perform their queued load in turn, before all resuming running together).
This can be resolved by permitting concurrent loads.
https://guides.rubyonrails.org/threading_and_code_execution.html#permit-concurrent-loads
Rails.application.executor.wrap do
urls.each do |currency|
threads << Thread.new do
CurrencyConverter.new(currency)
puts currency
end
ActiveSupport::Dependencies.interlock.permit_concurrent_loads do
threads.map(&:join)
end
end
end
Thank you everybody for your time, I appreciate.
Don't re-invent the wheel and use Sidekiq instead. 😉
From the project's page:
Simple, efficient background processing for Ruby.
Sidekiq uses threads to handle many jobs at the same time in the same process. It does not require Rails but will integrate tightly with Rails to make background processing dead simple.
With 400+ contributors, and 10k+ starts on Github, they have build a solid parallel job execution process that is production ready, and easy to setup.
Have a look at their Getting Started to see it by yourself.
I want to run an infinite loop on a separate thread that starts as soon as the app initializes (in an initializer). Here's what it might look like:
# in config/initializers/item_loop.rb
Thread.new
loop do
Item.find_each do |item|
# Get price from third-party api and update record.
item.update_price!
# Need to wait a little between requests to avoid getting throttled.
sleep 5
end
end
end
I tend to accomplish this by running batch updates in recurring background jobs. But this doesn't make sense since I don't really need parallelization, downtime, or queueing, I just want to update one item at a time in a single thread, forever.
Yet there are multiple things that concern me:
Leaked Connections: Should I open up a new connection_pool for the thread? Should I use a gem like safely to avoid crashing the thread?
Thread Safety: Should I be worried about race conditions? Should I make use of Mutex and synchronize? Does using ActiveRecord::Base.transaction impact thread safety?
Deadlock: Should I use Rails.application.executor.wrap?
Concurrent Ruby/Sleep Intervals: Should I use TimerTask from concurrent-ruby gem instead of sleep or something other than Thread.new?
Information on any of these subjects is appreciated.
Usually to perform a job in a background process(non web-server process) a background workers manager is used. Rails has a specific interface for that manager called ActiveJob There are few implementation of a background workers manager - Sidekiq, DelayedJob, Resque, etc. Sidekiq is preferred. Returning back to actual problem - you may create a schedule to run UpdatePriceJob every interval using gem sidekiq-scheduler Another nice extension for throttling Sidekiq workers is sidekiq-throttler
Some code snippets:
# app/workers/update_price_worker.rb
# Actual Worker class
class UpdatePriceWorker
include Sidekiq::Worker
sidekiq_options throttle: { threshold: 720, period: 1.hour }
def perform(item_id)
Item.find(item_id).update_price!
end
end
# app/workers/update_price_master_worker.rb
# Master worker that loops over items
class UpdatePriceMasterWorker
include Sidekiq::Worker
def perform
Item.find_each { |item| UpdatePriceWorker.perform_async item.id }
end
end
# config/sidekiq.yml
:schedule:
update_price:
cron: '0 */4 * * *' # Runs once per 4 hours - depends on how many Items are there
class: UpdatePriceMasterWorker
Idea of this setup - we run MasterWorker every 4 hours(this depends on how much time it takes to update all items). Master worker creates jobs to update price of an every particular item. UpdatePriceWorker is throttled to max 720 RPH.
I use rails runner x (god gem or k8s) in our similar case.
Rails runner runs in another process so that we do not have to worry about connection-leak and thread-safety.
God-gem or k8s supports concurrency and monitoring the job failure. Running 1 process with some specific sleep-time would promise third-party API throttles (running N process with N API-key could support speed up).
I think deadlock would happen in any concurrency situation.
I do not think this loop + sleep approach is a design flaw, because:
cron always starts based on schedule so that long running jobs could run simultaneously. We need to add a logic to avoid job overlapping. Rather, just loop + sleep keeps maximum throughput without any job overlap.
ActiveJob is good for one-shot long-running task, but it does not fit for daemon.
I'm writing a rake task that would be called every minute (possibly every 30 seconds in the future) by Whenever, and it contacts a polling API endpoint (per user in our database). Obviously, this is not efficient run as a single thread, but is it possible to multithread? If not, is there a good event-based HTTP library that would be able to get the job done?
I'm writing a rake task that would be called every minute (possibly every 30 seconds in the future) by Whenever
Beware of Rails startup times, it might be better to use a forking model such as Resque or Sidekiq, Rescue provides https://github.com/bvandenbos/resque-scheduler which should be able to do what you need, I can't speak about Sidekiq, but I'm sure it has something similar available (Sidekiq is much newer than Resque)
Obviously, this is not efficient run as a single thread, but is it possible to multithread? If not, is there a good event-based HTTP library that would be able to get the job done?
I'd suggest you look at ActiveRecord's find_each for tips on making your finder process more efficient, once you have your batches you can easily do something using threads such as:
#
# Find each returns 50 by default, you can pass options
# to optimize that for larger (or smaller) batch sizes
# depending on your available RAM
#
Users.find_each do |batch_of_users|
#
# Find each returns an Enumerable collection of users
# in that batch, they'll be always smaller than or
# equal to the batch size chosen in `find_each`
#
#
# We collect a bunch of new threads, one for each
# user, eac
#
batch_threads = batch_of_users.collect do |user|
#
# We pass the user to the thread, this is good
# habit for shared variables, in this case
# it doesn't make much difference
#
Thread.new(user) do |u|
#
# Do the API call here use `u` (not `user`)
# to access the user instance
#
# We shouldn't need to use an evented HTTP library
# Ruby threads will pass control when the IO happens
# control will return to the thread sometime when
# the scheduler decides, but 99% of the time
# HTTP and network IO are the best thread optimized
# thing you can do in Ruby.
#
end
end
#
# Joining threads means waiting for them to finish
# before moving onto the next batch.
#
batch_threads.map(&:join)
end
This will start no more than batch_size of threads, waiting after each batch_size to finish.
It would be possible to do something like this, but then you will have an uncontrollable number of threads, there's an alternative you might benefit from here, it gets a lot more complicated including a ThreadPool, and shared list of work to do, I've posted it as at Github so'as not to spam stackoverflow: https://gist.github.com/6767fbad1f0a66fa90ac
I would suggest using sidekiq which is great at multithreading. You can then enqueue separate jobs per user for polling the API. clockwork can be used to make the jobs you enqueue recurring.