Resque.. how can I get a list of the queues - ruby-on-rails

Ok.. On heroku I have up 24 workers (as I understand it)
I have say 1000 clients. Each with their own "schema" in a postgresql database.
each client has tasks that can be done "later".. sending orders to my companies back end, is a great example.
I was thinking that I could create a new queue for each client, and each queue would have it's own worker(process). That it seems isn't in the cards.
So ok.. my thinking now is to have a queue field in client record..
so client 1 through 15 are in queue_a
and client 16 through 106 are in queue_b.. ect If one client is using heaps, we could move them to a new queue, or move others out of the slow Queue. clients with low volumns could be collected.. It would be a balancing act, but it wouldn't be all that hard to manage, if we kept track of metrics (which we will anyway)
(any counter ideas would be awesome to hear, I'm really in spit ball phase)
Right now, though. I'd like to figure out how to create a worker for each queue.
https://gist.github.com/486161 tells me how to create X workers, but doesn't really let me set a worker to a Queue. If I knew that, and how to get a list of queues, I think I'd be on my way to a viable solution to the limits.
Reading onhttp://blog.winfieldpeterson.com/2012/02/17/resque-queue-priority/
I realize that my plan is fraught with hardship.. The first client/queue to get added to the worker, would get priority.. I don't want that, I'd want them to all have the same. As long as they are part of the same queue..

i just stick to the topic :)
getting all queues in resque is pretty easy
Resque.queues
is a list of all queue names, it does not include the 'failed' queue, i did something like this
(['failed'] + Resque.queues).each do |queue|
queue_size = queue=='failed' ? Resque::Failure.count : Resque.size(queue)
end

Related

Is this Ruby code using threads, thread pools, and concurrency correctly

I am what I now consider part 3 of completing a task of pinging a very large list of URLs (which number in the thousands) and retrieving a URL's x509 certificate associated with it. Part 1 is here (How do I properly use threads to ping a URL) and Part 2 is here (Why won't my connection pool implement my thread code).
Since I asked these two questions, I have now ended up with the following code:
###### This is the code that pings a url and grabs its x509 cert #####
class SslClient
attr_reader :url, :port, :timeout
def initialize(url, port = '443')
#url = url
#port = port
end
def ping_for_certificate_info
context = OpenSSL::SSL::SSLContext.new
tcp_client = TCPSocket.new(url, port)
ssl_client = OpenSSL::SSL::SSLSocket.new tcp_client, context
ssl_client.hostname = url
ssl_client.sync_close = true
ssl_client.connect
certificate = ssl_client.peer_cert
verify_result = ssl_client.verify_result
tcp_client.close
{certificate: certificate, verify_result: verify_result }
rescue => error
{certificate: nil, verify_result: nil }
end
end
The above code is paramount that I retrieve the ssl_client.peer_cert. Below I have the following code that is the snippet that makes multiple HTTP pings to URLs for their certs:
pool = Concurrent::CachedThreadPool.new
pool.post do
[LARGE LIST OF URLS TO PING].each do |struct|
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
end
end
pool.shutdown
pool.wait_for_termination
#Do some rails code with the database depending on the results.
So far when I run this code, it is unbelievably slow. I thought that by creating a thread pool with threads, the code would go much faster. That doesn't seem the case and I'm not sure why. A lot of it was because I didn't know the nuances of threads, pools, starvation, locks, etc. However, after implementing the above code, I read some more to try to speed it up and once again I'm confused and could use some clarification as to how I can make the code faster.
For starters, in this excellent article here (ruby-concurrency-parallelism) . We get the following definitions and concepts:
Concurrency vs. Parallelism
These terms are used loosely, but they do have distinct meanings.
Concurrency: The art of doing many tasks, one at a time. By switching
between them quickly, it may appear to the user as though they happen
simultaneously. Parallelism: Doing many tasks at literally the same
time. Instead of appearing simultaneous, they are simultaneous.
Concurrency is most often used for applications that are IO heavy. For
example, a web app may regularly interact with a database or make lots
of network requests. By using concurrency, we can keep our application
responsive, even while we wait for the database to respond to our
query.
This is possible because the Ruby VM allows other threads to run while
one is waiting during IO. Even if a program has to make dozens of
requests, if we use concurrency, the requests will be made at
virtually the same time.
Parallelism, on the other hand, is not currently supported by Ruby.
So from this piece of the article, I understand that what I want to do needs to be done concurrently because I am pinging URLs on the network and that Parallelism is not currently supported by Ruby.
Next is where things get confused for me. From my part 1 question on Stack Overflow, I learned the following in a comment given to me that I should do the following:
Use a thread pool; don't just create a thousand concurrent threads. For something like
connecting to a URL where there will be a lot of waiting you can
oversubscribe the number of threads per CPU core, but not by a huge
amount. You'll have to experiment.
Another user says this:
You'd not spawn thousands of threads, use a connection pool
(e.g https://github.com/mperham/connection_pool) so you have maximum
20-30 concurrent requests going (this maximum number should be
determined by testing at which point network performance drops and you
get these timeouts)
So for this part, I turned to concurrent-ruby and implemented both a CachedThreadPool and a FixedThreadPool with10 threads. I chose a `CachedThreadPool because it seemed to me that the number of threads needed would be taken care of for me by the Threadpool. Now in concurrent ruby's documentation for a pool, I see this:
pool = Concurrent::CachedThreadPool.new
pool.post do
# some parallel work
end
I thought we just established in the first article that parallelism is not supported in Ruby, so what is the thread pool doing? Is it working concurrently or in parallel? What exactly is going on? Do I need a thread pool or not? Also at this point in time I thought connection pools and thread pools were the same just used interchangeably. What is the difference between the two pools and which one do I need?
In another excellent article How to Perform Concurrent HTTP Requests in Ruby and Rails, this article introduces the Concurrent::Promises class form concurrent ruby to avoid locks and have thread safety with two api calls. Here is a snippet of code below with the following description:
def get_all_conversations
groups_thread = Thread.new do
get_groups_list
end
channels_thread = Thread.new do
get_channels_list
end
[groups_thread, channels_thread].map(&:value).flatten
end
Every request is executed it its own thread, which can run in parallel because it is a blocking I/O. But can you see a catch here?
In the above code, another mention of parallelism which we just said didn't exist in ruby. Below is the approach with Concurrent::Promise
def get_all_conversations
groups_promise = Concurrent::Promise.execute do
get_groups_list
end
channels_promise = Concurrent::Promise.execute do
get_channels_list
end
[groups_promise, channels_promise].map(&:value!).flatten
end
So according to this article, these requests are being made 'in parallel'. Are we still talking about concurrency at this point?
Finally, in these two articles, they talk about using Futures for concurrent http requests. I won't go into the details but I'll paste the links here.
1.Using Concurrent Ruby in a Ruby on Rails Application
2. Learn Concurrency by Implementing Futures in Ruby
Again, what's talked about in the article looks to me like the Concurrent::Promise functionality. I just want to note that the examples show how to use the concepts for two different API calls that need to be combined together. This is not what I need. I just need to make thousands of API calls fast and log the results.
In conclusion, I just want to know what I need to do to make my code faster and thread safe to make it run concurrently. What exactly am I missing to make the code go faster because right now it is going so slow that I might as well not have used threads in the first place.
Summary
I have to ping thousands of URLs using threads to speed up the process. The code is slow and I am confused if I am using threads, thread pools, and concurrency correctly.
Let us look at the problems you have described and try to solve these one at a time:
You have two pieces of code, SslClient and the script which uses this ssl client. From my understanding of the threadpool, the way you have used the threadpool needs to be changed a bit.
From:
pool = Concurrent::CachedThreadPool.new
pool.post do
[LARGE LIST OF URLS TO PING].each do |struct|
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
end
end
pool.shutdown
pool.wait_for_termination
to:
pool = Concurrent::FixedThreadPool.new(10)
[LARGE LIST OF URLS TO PING].each do | struct |
pool.post do
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
end
end
pool.shutdown
pool.wait_form
In the initial version, there is only one unit of work that is posted to the pool. In the second version, we are posting as many units of work to the pool as there are items in LARGE LIST OF URLS TO PING.
To add a bit more about Concurrency vs Parallelism in Ruby, it is true that Ruby doesn't support true parallelism due to GIL (Global Interpreter Lock), but this only applies when we are actually doing any amount of work on the CPU. In case of a network request, CPU bound work duration is very negligible compared to the IO bound work, which means that your usecase is a very good candidate for using threads.
Also by using a threadpool, we can minimize the overhead of thread creation incurred by the CPU. When we use a threadpool, like in the case of Concurrent::FixedThreadPool.new(10), we are literally restricting the number of threads that are available in the pool, for an unbound threadpool, new threads are created for everytime when a unit of work is present, but rest of thre threads in the pool are busy.
In the first article, there was a need to collect the result returned by each individual workers and also to act meaningfully in case of an exception (I am the author). You should be able to use the class given in that blog without any change.
Lets try rewriting your code using Concurrent::Future since in your case too, we need the results.
thread_pool = Concurrent::FixedThreadPool.new(20)
executors = [LARGE LIST OF URLS TO PING].map do | struct |
Concurrent::Future.execute({ executor: thread_pool }) do
ssl_client = SslClient.new(struct.domain.gsub("*.", "www."), struct.scan_port)
cert_info = ssl_client.ping_for_certificate_info
struct.x509_cert = cert_info[:certificate]
struct.verify_result = cert_info[:verify_result]
struct
end
end
executors.map(&:value)
I hope this helps. In case of questions, please ask in comments, I shall modify this write up to answer those.

Sidekiq handling re-queue when processing large data

See the updated question below.
Original question:
In my current Rails project, I need to parse large xml/csv data file and save it into mongodb.
Right now I use this steps:
Receive uploaded file from user, store the data into mongodb
Use sidekiq to perform async process of the data in mongodb.
After process finished, delete the raw data.
For small and medium data in localhost, the steps above run well. But in heroku, I use hirefire to dynamically scale worker dyno up and down. When worker still processing the large data, hirefire see empty queue and scale down worker dyno. This send kill signal to the process, and leave the process in incomplete state.
I'm searching a better way to do the parsing, allow the parsing process got killed anytime (saving the current state when receiving kill signal), and allow the process got re-queued.
Right now I'm using Model.delay.parse_file and it don't get re-queued.
UPDATE
After reading sidekiq wiki, I found article about job control. Can anyone explain the code, how it works, and how it preserve it's state when receiving SIGTERM signal and the worker get re-queued?
Is there any alternative way to handle job termination, save current state, and continue right from the last position?
Thanks,
Might be easier to explain the process and the high level steps, give a sample implementation (a stripped down version of one that I use), and then talk about throw and catch:
Insert the raw csv rows with an incrementing index (to be able to resume from a specific row/index later)
Process the CSV stopping every 'chunk' to check if the job is done by checking if Sidekiq::Fetcher.done? returns true
When the fetcher is done?, store the index of the currently processed item on the user and return so that the job completes and control is returned to sidekiq.
Note that if a job is still running after a short timeout (default 20s) the job will be killed.
Then when the job runs again simply, start where you left off last time (or at 0)
Example:
class UserCSVImportWorker
include Sidekiq::Worker
def perform(user_id)
user = User.find(user_id)
items = user.raw_csv_items.where(:index => {'$gte' => user.last_csv_index.to_i})
items.each_with_index do |item, i|
if (i+1 % 100) == 0 && Sidekiq::Fetcher.done?
user.update(last_csv_index: item.index)
return
end
# Process the item as normal
end
end
end
The above class makes sure that each 100 items we check that the fetcher is not done (a proxy for if shutdown has been started), and ends execution of the job. Before the execution ends however we update the user with the last index that has been processed so that we can start where we left off next time.
throw catch is a way to implement this above functionality a little cleaner (maybe) but is a little like using Fibers, nice concept but hard to wrap your head around. Technically throw catch is more like goto than most people are generally comfortable with.
edit
Also you could not make call to Sidekiq::Fetcher.done? and record the last_csv_index on each row or on each chunk of rows processed, that way if your worker is killed without having the opportunity to record the last_csv_index you can still resume 'close' to where you left off.
You are trying to address the concept of idempotency, the idea that processing a thing multiple times with potential incomplete cycles does not cause problems. (https://github.com/mperham/sidekiq/wiki/Best-Practices#2-make-your-jobs-idempotent-and-transactional)
Possible steps forward
Split the file up into parts and process those parts with a job per part.
Lift the threshold for hirefire so that it will scale when jobs are likely to have fully completed (10 minutes)
Don't allow hirefire to scale down while a job is working (set a redis key on start and clear on completion)
Track progress of the job as it is processing and pick up where you left off if the job is killed.

Moving a Resque job between queues

Is there anyway to move a resque job between two different queues?
We sometimes get in the situation that we have a big queue and a job that is near the end we find a need to "bump up its priority." We thought it might be an easy way to simply move it to another queue that had a worker waiting for any high priority jobs.
This happens rarely and is usually a case where we get a special call from a customer, so scaling, re-engineering don't seem totally necessary.
There is nothing built-in in Resque. You can use rpoplpush like:
module Resque
def self.move_queue(source, destination)
r = Resque.redis
r.llen("queue:#{source}").times do
r.rpoplpush("queue:#{source}", "queue:#{destination}")
end
end
end
https://gist.github.com/rafaelbandeira3/7088498
If it's a rare occurrence you're probably better off just manually pushing a new job into a shorter queue. You'll want to make sure that your system has a way to identify that the job has already run and to bail out so that when the job in the long queue is finally reached it is not processed again (if double processing is a problem for you).

How to avoid meeting Heroku's API rate limit with delayed job and workless

My Survey model has about 2500 instances and I need to apply the set_state method to each instance twice. I need to apply it the second time only after every instance has had the method applied to it once. (The state of an instance can depend on the state of other instances.)
I'm using delayed_job to create delayed jobs and workless to automatically scale up/down my worker dynos as required.
The set_state method typically takes about a second to execute. So I've run the following at the heroku console:
2.times do
Survey.all.each do |survey|
survey.delay.set_state
sleep(4)
end
end
Shouldn't be any issues with overloading the API, right?
And yet I'm still seeing the following in my logs for each delayed job:
Heroku::API::Errors::ErrorWithResponse: Expected(200) <=> Actual(429 Unknown)
I'm not seeing any infinite loops -- it just returns this message as soon as I create the delayed job.
How can I avoid blowing Heroku's API rate limits?
Reviewing workless, it looks like it incurs an API call per delayed job to check the worker count and potentially a second API call to scale up/down. So if you are running 5000 (2500x2) jobs within a short period, you'll end up with 5000+ API calls. Which would be well in excess of the 1200/requests per hour limit. I've commented over there to hopefully help toward reducing the overall API usage (https://github.com/lostboy/workless/issues/33#issuecomment-20982433), but I think we can offer a more specific solution for you.
In the mean time, especially if your workload is pretty predictable (like this). I'd recommend skipping workless and doing that portion yourself. ie it sounds like you already know WHEN the scaling would need to happen (scale up right before the loop above, scale down right after). If that is the case you could do something like this to emulate the behavior in workless:
require 'heroku-api'
heroku = Heroku::API.new(:api_key => ENV['HEROKU_API_KEY'])
client.post_ps_scale(ENV['APP_NAME'], 'worker', Survey.count)
2.times do
Survey.all.each do |survey|
survey.delay.set_state
sleep(4)
end
end
min_workers = ENV['WORKLESS_MIN_WORKERS'].present? ? ENV['WORKLESS_MIN_WORKERS'].to_i : 0
client.post_ps_scale(ENV['APP_NAME'], 'worker', min_workers)
Note that you'll need to remove workless from these jobs also. I didn't see a particular way to do this JUST for certain jobs though, so you might want to ask on that project if you need that. Also, if this needs to be 2 pass (the first time through needs to finish before the second), the 4 second sleep may in some cases be insufficient but that is a different can of worms.
I hope that helps narrow in on what you needed, but I'm certainly happy to discuss further and/or elaborate on the above as needed. Thanks!

delayed_job Won't Process My Queue?

I am using the delayed_jobs gem but using it against 2 queues. I have mapped my models against the correct queues (dbs) to establish correct connections.
The jobs get entered in fine - however, delayed_jobs will process one queue fine but not the other. I am trying to manually force it to process the email queue but it simply won't.
Is there a way to config/force it to? Or pass it the correct backend to process?
See below I am counting jobs - getting a correct count. However, if I try to 'work_off' the queue its showing 0 success/fails.
Pretty sure because its hitting the wrong queue. Any ideas?
Delayed::Worker::Email::Job.count
=> 12032
Delayed::Worker.new(:backend => Email::Job).work_off
=> [0, 0]
I ended up just going with one queue. This seemed to work best and save the headache of juggling two. Would be cool if DJ would eventually support multi-backends/queues.

Resources