Getting information about a particular sidekiq worker - ruby-on-rails

I have spawned a few thousand workers in sidekiq in the low queue. But apart from these workers there are a lot of other workers in low and other queues as well. I have access to the sidekiq admin dashboard and can view the queues and the workers running in them. But i need to scroll a lot to find information about workers that im interested in.
Is there a way to get information just about the status of the instances of a particular worker that im interested in ?

The Busy page is merely a reflection of the data available in the Workers API, you can roll your own report.
https://github.com/mperham/sidekiq/wiki/API#workers

Related

What threads do Dask Workers have active?

When running a Dask worker I notice that there are a few extra threads beyond what I was expecting. How many threads should I expect to see running from a Dask Worker and what are they doing?
Dask workers have the following threads:
A pool of threads in which to run tasks. This is typically somewhere between 1 and the number of logical cores on the computer
One administrative thread to manage the event loop, communication over (non-blocking) sockets, responding to fast queries, the allocation of tasks onto worker threads, etc..
A couple of threads that are used for optional compression and (de)serialization of messages during communication
One thread to monitor and profile the two items above
Additionally, by default there is an additional Nanny process that watches the worker. This process has a couple of its own threads for administration.
These are internal details as of October 2018 and may change without notice.
People who run into "too many threads" issues often are running tasks that are themselves multi-threaded, and so get an N-squared threading issue. Often the solution here is to use environment variables like OMP_NUM_THREADS=1 but this depends on the exact libraries that you're using.

Do you have to use worker pools in Erlang?

I have a server I am creating (a messaging service) and I am doing some preliminary tests to benchmark it. So far, the fastest way to process the data is to do it directly on the process of the user and to use worker pools. I have tested spawning and that is unbelievable slow.
The test is just connecting 10k users, and having each one send 15kb of data a couple of times at the same time(or trying too atleast) and having the server process the data (total length, headers, and payload).
The issue I have with worker pools is its only fast when you have enough workers to offset the amount of connections. For example, if you have 500k, or 1 million users, you would need more workers to process all the concurrent data coming in. And, as for my testing, having 1000 workers would make it unusable.
So my question is the following: When does it make sense to use pools of workers? Will there be a tipping point where I would have to use workers to process the data to free up the user process? How many workers is too much, is 500,000 too much?
And, if workers are the way to go (for those massive concurrent distributed servers), I am guessing you can dynamically create/delete as you need?
Any literature is also appreciated!
Thanks for your answer!
Maybe worker pools are not the best tool for your problem. If I were you I would try using Jay Nelson's epocxy, which gives you a very basic backpressure mechanism while still letting you parallelize your tasks. From that library I would check either concurrency fount or concurrency control tools.

Rails concurrency and activejobs

Here are some questions I have on ActiveJobs:
Say I've queued up n number of jobs on a job queue on sidekiq via ActiveJobs. On my EC2, I've set puma to have 4 workers, with 5 threads each. Does this mean up to 20 concurrent jobs will run at the same time? Will each thread pick up a queued job when it's idle and just process it? I tried this setting but it seems like it is still processing it in serial - 1 job at a time. Is there more settings I would need to do?
Regarding concurrency - how would I be able to setup even more EC2 instances just to tackle the job queue itself?
Regarding the queues itself - is there a way for us to manage / look at the queue from within Rails? Or should I rely on sidekiq's web interface to look at the queue?
Sidekiq has good Wiki. As for your questions:
Sidekiq(and other Background Job implementations) works as
where producer is your Rails app(s), Queue - Redis and consumer - Sidekiq worker(s). All three entities are completely independent applications, which may run on different servers. So, neither Puma nor Rails application can affect Sidekiq concurrency at all.
Sidekiq concurrency description goes far beyond SO answer. You can google large posts by "scaling Sidekiq workers". In short: yes, you can run separate EC2 instance(s) and set up Redis and tune Sidekiq workers count, concurrency per worker, ruby runtime, queues concurrency and priority and so so on.
Edited: Sidekiq has per worker configruration (usually sidekiq.yml). But number of workers is managed by system tools like Debian's Upstart. Or you can buy Sidekiq Pro/Enterprise with many features (like sidekiqswarm).
From wiki: Sidekiq API

Scaling Dyno worker size dynamically on Heroku Rails application

I am working on a project that launches a process via a Rails worker that is very resource intensive and it can only be handled properly by a Performance Worker on Heroku, 1X workers are killed because they use too much RAM and 2X workers can barely handle the load exceeding their RAM limits by up to 160%. A performance worker does the job fine with no issues.
My question is, is there a way to dynamically switch the Dyno size to Performance before a job initiates and then scale it back down once the job is finished or a queue is empty?
I know HireFire exists but to my knowledge this service only increases the amount of workers based on a queue length etc? Another possible solution I thought about was using the Heroku API which has a Dyno endpoint to resize the worker dyno before the job starts and then resize it back down when the job ends.
Does anyone else have other recommendations, ideas or strategies for this issue?
Thanks!
The best way is the one you mentioned: use the Heroku Platform API to scale your Dyno size up before starting the job, and then down again afterwards.
This is because tools like HireFire only work by inspecting stuff like application response time, router queue, etc. -- so there's no way for them to know you're about to run some job and then scale up just for that.
Depending on the specifics of the usage, you may be able to just create a distinct dyno-type in your procfile that only runs this particular worker and is always scaled to performance, but isn't always running? You could even just run this with one-off runs, instead of scaling it potentially (this can also be done via the API, roughly equivalent to heroku run ...). That said, #rdegges answer should certainly work.

Should I have a heroku worker dyno for poll a AWS SQS?

Im confusing about where should I have a script polling an Aws Sqs inside a Rails application.
If I use a thread inside the web app probably it will use cpu cycles to listen this queue forever and then affecting performance.
And if I reserve a single heroku worker dyno it costs $34.50 per month. It makes sense to pay this price for it for a single queue poll? Or it's not the case to use a worker for it?
The script code:
What it does: Listen to converted pdfs. Gets the responde and creates the object into a postgres database.
queue = AWS::SQS::Queue.new(SQSADDR['my_queue'])
queue.poll do |msg|
...
id = received_message['document_id']
#document = Document.find(id)
#document.converted_at = Time.now
...
end
I need help!! Thanks
You have three basic options:
Do background work as part of a worker dyno. This is the easiest, most straightforward option because it's the thing that's most appropriate. Your web processes handle incoming HTTP requests, and your worker process handles the SQS messages. Done.
Do background work as part of your web dyno. This might mean spinning up another thread (and dealing with the issues that can cause in Rails), or it might mean forking a subprocess to do background processing. Whatever happens, bear in mind the 512 MB limit of RAM consumed by a dyno, and since I'm assuming you have only one web dyno, be aware that dyno idling means your app likely isn't running 24x7. Also, this option smells bad because it's generally against the spirit of the 12-factor app.
Do background work as one-off processes. Make e.g. a rake handle_sqs task that processes the queue and exits once it's empty. Heroku Scheduler is ideal: have it run once every 20 minutes or something. You'll pay for the one-off dyno for as long as it runs, but since that's only a few seconds if the queue is empty, it costs less than an always-on worker. Alternately, your web app could use the Heroku API to launch a one-off process, programmatically running the equivalent heroku run rake handle_sqs.

Resources