I just read a little about resque here and how you use redis as a "advanced key-value store" for the jobs.
As you might know you can use resque on multiple machines to process the jobs:
Workers can be given multiple queues (a "queue list") and run on multiple machines. In fact they can be run anywhere with network access to the Redis server.
Now my question is... Is resque able to connect to any other key-value database such as SimpleDB or CouchDB? And if yes, does this even make sense?
No, it is not able, as it mostly uses Redis' features specifically written for handlin queues, such as brpop and blpush. CouchDB/SimpleDB's eventual consistency keeps them from being ideal candidates for queues, AMQP implementations, such as RabbitMQ would be suited, but neither usable with Resque.
Related
I'm trying to establish if all the workers in my cluster need to be able to see each other, or just the scheduler process. When data needs to be transferred between workers, do they communicate directly, or send data via the sheduler?
Workers should ideally be able to communicate directly with each other, to be able to quicker copy data (results) as needed. You do not want to make your scheduler the single bottleneck for data communication; all messages and tasks pass through the scheduler, but these tend to be much smaller.
EDIT: docs link: http://distributed.dask.org/en/stable/journey.html#step-5-execute-on-the-worker
Here are some questions I have on ActiveJobs:
Say I've queued up n number of jobs on a job queue on sidekiq via ActiveJobs. On my EC2, I've set puma to have 4 workers, with 5 threads each. Does this mean up to 20 concurrent jobs will run at the same time? Will each thread pick up a queued job when it's idle and just process it? I tried this setting but it seems like it is still processing it in serial - 1 job at a time. Is there more settings I would need to do?
Regarding concurrency - how would I be able to setup even more EC2 instances just to tackle the job queue itself?
Regarding the queues itself - is there a way for us to manage / look at the queue from within Rails? Or should I rely on sidekiq's web interface to look at the queue?
Sidekiq has good Wiki. As for your questions:
Sidekiq(and other Background Job implementations) works as
where producer is your Rails app(s), Queue - Redis and consumer - Sidekiq worker(s). All three entities are completely independent applications, which may run on different servers. So, neither Puma nor Rails application can affect Sidekiq concurrency at all.
Sidekiq concurrency description goes far beyond SO answer. You can google large posts by "scaling Sidekiq workers". In short: yes, you can run separate EC2 instance(s) and set up Redis and tune Sidekiq workers count, concurrency per worker, ruby runtime, queues concurrency and priority and so so on.
Edited: Sidekiq has per worker configruration (usually sidekiq.yml). But number of workers is managed by system tools like Debian's Upstart. Or you can buy Sidekiq Pro/Enterprise with many features (like sidekiqswarm).
From wiki: Sidekiq API
I have a platform (based on Rails 4/Postgres) running on an auto scaling Elastic Beanstalk web environment. I'm planning on offloading long running tasks (sync with 3rd parties, delivering email etc) to a Worker tier, which appears simple enough to get up and running.
However, I also want to run periodic batch processes. I've looked into using cron.yml and the scheduling seems pretty simple, however the batch process I'm trying to build needs to access the data from the web application to be able to work.
Does anybody have any opinion of the best way of doing this? Either a shared RDS database between web and worker tier, or perhaps a web service that the worker tier can access?
Thanks,
Dan
Note: I've added an extra question, which more broadly describes my
requirements as it struck me that this might not be the best approach.
What's the best way to implement this shared batch process with Elastic Beanstalk?
Unless you need a full relational database management system (RDBMS), consider using S3 for shared persistent data storage across your instances.
Also consider Amazon Simple Queue Service (SQS):
SQS is a fast, reliable, scalable, fully managed message queuing
service. SQS makes it simple and cost-effective to decouple the
components of a cloud application. You can use SQS to transmit any
volume of data, at any level of throughput, without losing messages or
requiring other services to be always available.
we are currently planning a rails 3.2.2 application where we use RabbitMQ. We would like to run several kind of workers (and several instances of a worker) to process messages from different queues. The workers are written in ruby and are laying in the lib directory of the rails app.
Some of the workers needs the rails framework (active record, active model...) and some of them don't. The first worker should be called every minute to check if updates are available. The other workers should process the messages from their queues when messages (which are send by the first worker) are present and do some (time consuming) stuff with it.
So far, so good. My problem is, that I only have little experiences with messaging systems like RabbitMQ and no experiences with the rails interaction between them. So I'm wondering what the best practices are to get the two playing with each other. Here are my requirements again:
Rails 3.2.2 app
RabbitMQ
Several kind of workers
Several instances of one worker
Control the amount of workers out of rails
Workers are doing time consuming tasks, so they have to be async
Only a few workers needs the rails framework. The others are just ruby files with some dependencies like Net or File
I was looking for some solution and came up with two possibilities:
Using amqp with EventMachine in a new thread
Of course, I don't want my rails app to be blocked when a new worker is created. The worker should run in another thread and do its work asynchronously. And furthermore, it should not start a new instance of my rails application. It should only require the things the worker needs.
But in some articles they say that there are some issues with Passenger. And another fact that I don't like is, that we are using webbrick for development and we ought to include workarounds for that too. It would be possible to switch to another webserver like thin, but I don't have any experience with that either.
Using some kind of daemonizing
Maybe its possible to run workers as a daemon, but I don't know how much overhead this would come up with, or how I can control the amount of workers.
Hope someone can advise a good solution for that (and I hope I made myself clear ;)
It seems to me that AMQP is a big shot to kill your problem. Have you tried to use Resque? The backed Redis database has some neat features (like publish/subscribe and blocking list pop) which make it very interesting as a message queue, and Resque is very easy to use in any Rails app.
The workers are daemonized, and you decide which worker of your pool listens to which queue, so you can scale each type of job as needed.
Using EM reactor inside a request/response cycle is not recommended, because it may conflict with an existing event loop (for instance if your app is served by thin), in any case you have to configure it specifically for your web server, OTOS it may be interesting to have an evented queue consumer, if your jobs have blocking IO and are not processor-bound.
If you still want to do it with AMQP, see Starting the event loop and connecting in Web applications and configure for your web server accordingly. Or use bunny to push synchronously in the queue (and whichever job consumer you deam useflu, like workling for instance)
we are running slightly different -- but similar technology stack.
daemon kit is used for eventmachine side of the system... no rails, but shared models (mongomapper & mongodb). EM is pulling messages off the queues, and doing whatever logic is required (we have ruleby in the mix, but if-then-else works too).
mulesoft ESB is our outward-facing message receiver and sender that helps us deal with the HL7/MLLP world. But in v1 of the app, we used some java code in ActiveMQ to manage HL7 messages.
the rails app then just serves up stuff for the user to see -- again, using the shared models.
I would like to write a Node.js UDP server on Heroku and plan to queue up the data to a Rails instance (dyno) for it to process? What are the pros and cons of using Delayed Job vs RabbitMQ? Thanks, Chirag
These are very hard to compare! RabbitMQ is a messaging system, while delayed_job is a database-backed task queue.
With RabbitMQ you can create a task queue, but that is just one of many use cases.
One could say that delayed_job is a very limited implementation of a task queue,, as the database is not suited for this kind of work.
(see e.g. http://www.rabbitmq.com/resources/RabbitMQ_Oxford_Geek_Night.pdf)
The database may work good enough for simple setups, but it is likely to eventually
fall apart.
If you want a task queue, I suggest you look for one that supports RabbitMQ.