How to see the stats about workers in Sidekiq? - ruby-on-rails

We can get some information about sidekiq's status via its API. There's not an API method for workers.
I want to know how many workers a certain class is running. For example I have a class named FooStreamer.rb and it's performed with perform_async method. I want to know how many workers it is running at the current time.
Any ideas?

Solved by Mike (creator of sidekiq) with the following commit:
https://github.com/mperham/sidekiq/commit/c606dd4fde8cdc795d2c750d211a74bf1b380217

Sidekiq comes with a Sinatra web interface which you can access thru mydomain.com/sidekiq. You just need to mount it as per these instructions (it differs depending on whether you use Passenger or Unicorn)
https://github.com/mperham/sidekiq/wiki/Monitoring
There's no API that I know of, but you can easily iterate through the Redis keys that stores the Sidekiq information to count the number of workers working on a particular queue
workers = redis.smembers("workers")
workers.each do |worker|
tokens = worker.split(":")
machine = tokens[0]
pid = tokens[1].split("-")[0]
key = "worker:" + pid
obj = redis.get(key)
#obj will contain information on what queue this worker is processing
end

You can use https://github.com/phstc/sidekiq-statsd and check the stats using a Graphite.

Related

Creating different types of workers that are accessed using a single client

EDIT:
My question was horrifically put so I delete it and rephrase entirely here.
I'll give a tl;dr:
I'm trying to assign each computation to a designated worker that fits the computation type.
In long:
I'm trying to run a simulation, so I represent it using a class of the form:
Class Simulation:
def __init__(first_Client: Client, second_Client: Client)
self.first_client = first_client
self.second_client = second_client
def first_calculation(input):
with first_client.as_current():
return output
def second_calculation(input):
with second_client.as_current():
return output
def run(input):
return second_calculation(first_calculation(input))
This format has downsides like the fact that this simulation object is not pickleable.
I could edit the Simulation object to contain only addresses and not clients for example, but I feel as if there must be a better solution. For instance, I would like the simulation object to work the following way:
Class Simulation:
def first_calculation(input):
client = dask.distributed.get_client()
with client.as_current():
return output
...
Thing is, the dask workers best fit for the first calculation, are different than the dask workers best fit for the second calculation, which is the reason my Simulation object has two clients that connect to tow different schedulers to begin with. Is there any way to make it so there is only one client but two types of schedulers and to make it so the client knows to run the first_calculation to the first scheduler and the second_calculation to the second one?
Dask will chop up large computations in smaller tasks that can run in paralell. Those tasks will then be submitted by the client to the scheduler which in turn wil schedule those tasks on the available workers.
Sending the client object to a Dask scheduler will likely not work due to the serialization issue you mention.
You could try one of two approaches:
Depending on how you actually run those worker machines, you could specify different types of workers for different tasks. If you run on kubernetes for example you could try to leverage the node pool functionality to make different worker types available.
An easier approach using your existing infrastructure would be to return the results of your first computation back to the machine from which you are using the client using something like .compute(). And then use that data as input for the second computation. So in this case you're sending the actual data over the network instead of the client. If the size of that data becomes an issue you can always write the intermediary results to something like S3.
Dask does support giving specific tasks to specific workers with annotate. Here's an example snippet, where a delayed_sum task was passed to one worker and the doubled task was sent to the other worker. The assert statements check that those workers really were restricted to only those tasks. With annotate you shouldn't need separate clusters. You'll also need the most recent versions of Dask and Distributed for this to work because of a recent bug fix.
import distributed
import dask
from dask import delayed
local_cluster = distributed.LocalCluster(n_workers=2)
client = distributed.Client(local_cluster)
workers = list(client.scheduler_info()['workers'].keys())
with dask.annotate(workers=workers[0]):
delayed_sum = delayed(sum)([1, 2])
with dask.annotate(workers=workers[1]):
doubled = delayed_sum * 2
# use persist so scheduler doesn't clean up
# wrap in a distributed.wait to make sure they're there when we check the scheduler
distributed.wait([doubled.persist(), delayed_sum.persist()])
worker_restrictions = local_cluster.scheduler.worker_restrictions
assert worker_restrictions[delayed_sum.key] == {workers[0]}
assert worker_restrictions[doubled.key] == {workers[1]}

Sidekiq threads accessing global variable

I have a controller that spins off 6 sidekiq threads for faster parallel processing of a large file. Before that however I want to provide these threads with a few variables that should be available accross all threads because they variables themselves are fairly memory intensive. (it is only reading from that, not writing, so the concurrency issues doesn't exist)
In other words my controller looks like this
def foo
$bar1 = ....
$bar2 = ...
worker.perform_async()...
worker2.perform_async()...
end
I don't want to put those global vars into the perform methods because serializing those to redis chokes the entire thing. My issue is that the workers cannot see these variables and die because of a no method error (i.e. trying to call .first on on of them gives that error because the var is nil for the workers).
How come? Is there any other way to do this that won't kill my memory? (i.e. I don't want to take up most of the mem with 6x the same large array)
Sidekiq runs on a separate process, so it doesn't share the same memory as the initiator of the worker.
If the data is static, you might want to load it on the start of the sidekiq process (maybe when you configure the sidekiq server).
If it changes per task, you should model it in a way where you can create a global repository to hold it (if redis is not good for this, maybe you can try memcached)...

How can I get notified when Sidekiq Enqueues jobs or has stopped processing jobs?

I am on Heroku and I got an error because my redis db got too full. The my sidekiq processes stopped working. It was like that for a day until I realized it. Now I have 600+ jobs that I have tried to process but they are just breaking everything now. How can I sound off the alarms when sidekiq can't process jobs or when the Enqueue starts to fill up?
You could set a rake task on a schedule to check Sidekiq stats, and then take the appropriate action ( like send an email ).
I've created my own module with helper methods for Sidekiq that serves a number of purposes, e.g deleting jobs, checking queues, retriving jobs by certain criteria etc. https://gist.github.com/blotto/10324119
For your purpose, grab the sidekiq stats as such :
def sidekiq_stats()
summary = Hash.new
stats = Sidekiq::Stats.new
summary = { processed: stats.processed,
failed: stats.failed,
enqueued: stats.enqueued,
queues: stats.queues}
end
And then evaluate the enqueued value, set a tolerance on what you think is too high, and then let loose the hounds.
If you're using zabbix for monitoring, you could use sidekiq_queue_zabbix template at https://github.com/hungntit/sidekiq_queue_zabbix. This template supports showing graph and sending alert when sidekiq queue size is higher than one specified limit number

Rails: Batch Subscribe users to MailChimp API on Heroku running into timeout issues

Here is my code
users = User.all
# Latency issues with connecting with Heroku and MC
Gibbon::API.timeout = 120
gb = Gibbon::API.new
batch = []
users.each do |user|
batch << user.mail_chimp_information
end
puts gb.lists.batchSubscribe(id: "MC_ID_HERE", batch: batch, double_optin: false, update_existing: true)
The code above is set to run on a nightly cron that batch subscribes (or updates existing) users to my MailChimp account. My app is running on Heroku which causes issues with retrieving users and then looping through them before sending them to MailChimp. If I remove the Gibbon::API.timeout = 120 line, then the default is 15 seconds and times out.
What is the best practice on batch uploading of user information to an external API? Manually setting the Timeout is a quick fix for now, but as my user base grows, the greater the threat for time outs to occur again.
Best practice is to use a delayed job to do the work for you.
The idea is that you setup each job as a one off (instead of many) delayed job and then have a background queue that handles the subscribing.
The common gems are:
DelayedJob
Resque
Sidekiq

Resque and redis wait for variable to be set

Currently I am running on rails 3.1 rc4 and using redis and resque for queueing creation of rackspace servers.
The rackspace gem that I am using, cloudservers, tells you when your server is done being setup with the status method.
What I am trying to do with the code below is execute the code in the elsif only after the server is active and ready to be used.
class ServerGenerator
#queue = :servers_queue
def self.perform(current_id)
current_user = User.find(current_id)
cs = CloudServers::Connection.new(:username => "***blocked for security***", :api_key => "***blocked for security***")
image = cs.get_image(49) # Set the linux distro
flavor = cs.get_flavor(1) # Use the 256 Mb of Ram instance
newserver = cs.create_server(:name => "#{current_user.name}", :imageId => image.id, :flavorId => flavor.id)
if newserver.status == "BUILD"
newserver.refresh
elsif newserver.status == "ACTIVE"
# Do stuff here, I generated another server with a different, static name
# so that I could see if it was working
cs = CloudServers::Connection.new(:username => "***blocked for security***", :api_key => "***blocked for security***")
image = cs.get_image(49)
flavor = cs.get_flavor(1)
newserver = cs.create_server(:name => "working", :imageId => image.id, :flavorId => flavor.id)
end
end
end
When I ran the above, it only generated the first server that uses the "current_user.name", as it's name. Would a loop help around the if statement? Also this seems like a poor way of queueing tasks.
Should I enqueue a new task that just checks to see if the server is ready or not?
Thanks a bunch!
Based upon what you've written, I'm assuming that cs.create_server is non-blocking. In this case, yes, you would need to wrap your check in a do ... loop or some similar construct. Otherwise you're checking the value precisely once and then exiting the perform method.
If you're going to loop in the method, you should add in sleep calls, otherwise you're going to burn a lot of CPU cycles doing nothing. Whether to loop or call a separate job is ultimately up to you and whether your workers are mostly idle. Put another way, if it takes 5 min. for your server to come up, and you just loop, that worker is not going to be able to process any other jobs for 5 min. If that's acceptable, it's certainly the easiest thing to do. If not acceptable, you'll probably want another job that accepts your server ID and makes an API call to see if it's available.
That process itself can be tricky though. If your server never comes online for whatever reason, you could find yourself creating jobs waiting for its status ad infinitum. So, you probably want to pass some sort of execution count around too, or keep track in redis, so you stop trying after X number of tries. I'd also check out resque-scheduler so you can exert control over when your job gets executed in this case.

Resources