RabbitMQ with EventMachine and Rails - ruby-on-rails

we are currently planning a rails 3.2.2 application where we use RabbitMQ. We would like to run several kind of workers (and several instances of a worker) to process messages from different queues. The workers are written in ruby and are laying in the lib directory of the rails app.
Some of the workers needs the rails framework (active record, active model...) and some of them don't. The first worker should be called every minute to check if updates are available. The other workers should process the messages from their queues when messages (which are send by the first worker) are present and do some (time consuming) stuff with it.
So far, so good. My problem is, that I only have little experiences with messaging systems like RabbitMQ and no experiences with the rails interaction between them. So I'm wondering what the best practices are to get the two playing with each other. Here are my requirements again:
Rails 3.2.2 app
RabbitMQ
Several kind of workers
Several instances of one worker
Control the amount of workers out of rails
Workers are doing time consuming tasks, so they have to be async
Only a few workers needs the rails framework. The others are just ruby files with some dependencies like Net or File
I was looking for some solution and came up with two possibilities:
Using amqp with EventMachine in a new thread
Of course, I don't want my rails app to be blocked when a new worker is created. The worker should run in another thread and do its work asynchronously. And furthermore, it should not start a new instance of my rails application. It should only require the things the worker needs.
But in some articles they say that there are some issues with Passenger. And another fact that I don't like is, that we are using webbrick for development and we ought to include workarounds for that too. It would be possible to switch to another webserver like thin, but I don't have any experience with that either.
Using some kind of daemonizing
Maybe its possible to run workers as a daemon, but I don't know how much overhead this would come up with, or how I can control the amount of workers.
Hope someone can advise a good solution for that (and I hope I made myself clear ;)

It seems to me that AMQP is a big shot to kill your problem. Have you tried to use Resque? The backed Redis database has some neat features (like publish/subscribe and blocking list pop) which make it very interesting as a message queue, and Resque is very easy to use in any Rails app.
The workers are daemonized, and you decide which worker of your pool listens to which queue, so you can scale each type of job as needed.
Using EM reactor inside a request/response cycle is not recommended, because it may conflict with an existing event loop (for instance if your app is served by thin), in any case you have to configure it specifically for your web server, OTOS it may be interesting to have an evented queue consumer, if your jobs have blocking IO and are not processor-bound.
If you still want to do it with AMQP, see Starting the event loop and connecting in Web applications and configure for your web server accordingly. Or use bunny to push synchronously in the queue (and whichever job consumer you deam useflu, like workling for instance)

we are running slightly different -- but similar technology stack.
daemon kit is used for eventmachine side of the system... no rails, but shared models (mongomapper & mongodb). EM is pulling messages off the queues, and doing whatever logic is required (we have ruleby in the mix, but if-then-else works too).
mulesoft ESB is our outward-facing message receiver and sender that helps us deal with the HL7/MLLP world. But in v1 of the app, we used some java code in ActiveMQ to manage HL7 messages.
the rails app then just serves up stuff for the user to see -- again, using the shared models.

Related

Ruby on Rails on few servers

I have a big application. One of the part of this is highload processing with user files. I decide to provide for this one dedicate server. There will be nginx for distribution content and some programs (non rails) for processing files.
I have two question:
What better to use on this server? (Rails or something else, maybe Sinatra)
If I'll use Rails how to deploy? I can't find any instruction. If I have one app and two servers how to deploy it and delegate task for each other?
ps I need to authorize user on both servers. In Rails I use Devise.
You can use Rails for this. If both servers will act as a web client to the end user then you'll need some sort of load balancer in front of the two servers. HAProxy does a great job on this.
As far as getting the two applications to communicate with each other, this will be less trivial than you may think. What you should do is use a locking mechanism on performing the tasks. Delayed_job by default will lock a job in the queue so that any other works will not try and work on the same job. You can use callbacks from ActiveJob to notify the user via web sockets whenever their job is completed.
Anything that will take time or calling an external API should usually be placed into a background processing queue so that you're not holding up the user.
If you cannot spin up more than the two servers, you should make one of them the master or at least have some clear roles of the two servers. For example, one server may be your background processing and memcache server while the other is storing your database and handles your web sockets.
There are a lot of different ways of configuring the services and anything including and beyond what I've mentioned is opinionated.
Having separate servers for handling tasks is my preference as it makes them easier to manage from a Sys Admin perspective. For example, if we find that our web sockets server is hammered, we can simply spin up a few more web socket servers and throw them into a load balancer pool. The end user would not be negatively impacted from your networking changes. Whereas, if you have your servers performing dual roles outside of your standard Rails installation, you may find yourself cloning and wasting resources. Each of my web servers usually also perform background tasks on low-intermediate priority queues while a dedicated server is left for handling mission critical jobs.

RoR: multiple calls in a row to the same long-time-response controller

Update:
Read "Indicate to an ajax process that the delayed job has completed" before if you have the same problem. Thanks Gene.
I have a problem with concurrency. I have a controller scraping a few web sites, but each call to my controller needs about 4-5 seconds to respond.
So if I call 2 (or more) times in a row, the second call needs wait for the first call before starting.
So how I can fix this problem in my controller? Maybe with something like EventMachine?
Update & Example:
application_controller.rb
def func1
i=0
while i<=2
puts "func1 at: #{Time.now}"
sleep(2)
i=i+1
end
end
def func2
j=0
while j<=2
puts "func2 at: #{Time.now}"
sleep(1)
j=j+1
end
end
whatever_controller.rb
puts ">>>>>>>> Started At #{Time.now}"
func1()
func2()
puts "End at #{Time.now}"
So now I need request http://myawesome.app/whatever several times at the same times from the same user/browser/etc.
I tried Heroku (and local) with Unicorn but without success, this is my setup:
unicorn.rb http://pastebin.com/QL0wdGx0
Procfile http://pastebin.com/RrTtNWJZ
Heroku setup https://www.dropbox.com/s/wxwr5v4p61524tv/Screenshot%202014-02-20%2010.33.16.png
Requirements:
I need a RESTful solution. This is API so I need to responds JSON
More info:
I have right now 2 cloud servers running.
Heroku with Unicorn
Engineyard Cloud with Nginx + Panssenger
You're probably using webrick in development mode. Webrick only handles one request at a time.
You have several solutions, many ruby web servers exist that can handle concurrency.
Here are a few of them.
Thin
Thin was originally based on mongrel and uses eventmachine for handling multiple concurrent connections.
Unicorn
Unicorn uses a master process that will dispatch requests to web workers, 4 workers equals 4 concurrent possible requests.
Puma
Puma is a relatively new ruby server, its shiny feature is that it handles concurrent requests in threads, make sure your code is threadsafe !
Passenger
Passenger is a ruby server bundled inside nginx or apache, it's great for production and development
Others
These are a few alternatives, many other exist, but I think they are the most used today.
To use all these servers, please check their instructions. They are generally available on their github README.
For any long response time controller function, the delayed job gem
is a fine way to go. While it is often used for bulk mailing, it works as well for any long-running task.
Your controller starts the delayed job and responds immediately with a page that has a placeholder - usually a graphic with a progress indicator - and Ajax or a timed reload that updates the page with the full information when it's available. Some information on how to approach this is in this SO article.
Not mentioned in the article is that you can use redis or some other memory cache to store the results rather than the main database.
Answers above are part of the solution: you need a server environment that can properly dispatch concurrent requests to separate workers; unicorn or passenger can both work by creating workers in separate processes or threads. This allows many workers to sit around waiting while not blocking other incoming requests.
If you are building a typical bot whose main job is to get content from other sources, these solutions may be ok. But if what you need is a simple controller that can accept hundreds of concurrent requests, all of which are sending independent requests to other servers, you will need to manage threads or processes yourself. Your goal is to have many workers waiting to do a simple job, and one or more masters whose jobs it is to send requests, then be there to receive the responses. Ruby's Thread class is simple, and works well for cases like this with ruby 2.x or 1.9.3.
You would need to provide more detail about what you need to do for help getting to any more specific solution.
Try something like unicorn as it handles concurrency via workers. Something else to consider if there's a lot of work to be done per request, is to spin up a delayed_job per request.
The one issue with delayed job is that the response won't be synchronous, meaning it won't return to the user's browser.
However, you could have the delayed job save its responses to a table in the DB. Then you can query that table for all requests and their related responses.
What ruby version are you utilizing?
Ruby & Webserver
Ruby
If its a simple application I would recommend the following. Try to utilize rubinius (rbx) or jruby as they are better at concurrency. Although they have drawback as they're not mainline ruby so some extensions won't work. But if its a simple app you should be fine.
Webserver
use Puma or Unicorn if you have the patience to set it up
If you're app is hitting the API service
You indicate that the Global Lock is killing you when you are scraping other sites (presumably ones that allow scraping), if this is the case something like sidekiq or delayed job should be utilized, but with caution. These will be idempotent jobs. i.e. they might be run multiple times. If you start hitting a website multiple times, you will hit a website's Rate limit pretty quickly, eg. twitter limits you to 150 requests per hour. So use background jobs with caution.
If you're the one serving the data
However reading your question it sounds like your controller is the API and the lock is caused by users hitting it.
If this is the case you should utilize dalli + memcached to serve your data. This way you won't be I/O bound by the SQL lookup as memcached is memory based. MEMORY SPEED > I/O SPEED

How is request processing with rails, redis, and node.js asynchronous?

For web development I'd like to mix rails and node.js since I want to get the best out of both worlds (rails for fast web development and node for concurrency). I know that some people choose to just use full ruby stack with eventmachine that is integrated into rails controller so that every request can be nonblocking by using fiber in event-loop model. I have been able to understand how that works in a big picture.
At this moement however I want to try doing nonblocking request processing with rails and node.js with message queue concept. I heard that this can be achieved by using redis as an intermediary. I'm still having trouble trying to figure out how that works as of now. From what I can understand: so we have 2 apps A (rails) and B (node.js) and redis. rails app will handle requests from users that go through controllers in REST manner, and then from there rails will pass that through redis, and then redis will form queues and node.js app will pick up that queue and do whatever necessary afterhand (write or read from backend db).
My questions:
So how would that improve concurrency and scalability? from what i
know since rails handle the requests through controllers
synchronously, and then write to redis, the requests will be
blocking still, even though node.js end can pickup the queue
asynchronously. (I have a feeling that it's not asynchronous yet if it's not end to end
non-blocking).
Would node.js be considered a proxy or an application here if redis
is the intermediary?
I'm new to redis and learning it still. If I'm using 100% noSQL
solution for my backend database, such as mongoDB or couchDB, are they replaceable by redis entirely or is redis more seen as a
messaging queue tool like rabbitMQ?
Is messaging queue a different concurrency concept than threading or
event-loop model or is it supposed to supplement them?
That's all my question. I'm new to message queue concept. Will appreciate any help and pointers to right direction and articles that help me learn more. thanks.
You are mixing some things here that don't go together.
Let's first make sure we are on the same page regarding the strengths/weaknesses of the involved technologies
Rails: Used for it's web-development simplicity and perfect for serving database-backed web-applications.
Not very performant when having to serve a large number of long running requests as you'd run out of threads on your Ruby workers - but well suited for anything that can scale horizontally with more web-nodes (multiple web-servers - 1 db).
Node.js: Great for high-concurrency scenarios. Not as easy as rails to write a regular web-application in it. But can handle near an insane amount of long-running low-cpu tasks efficiently.
Redis: A Key-Value Store that supports operations on it's data-structures (increment/decrement values, append/prepent push/pop to lists - all operations that make this DB work consistently with multiple clients writing at once)
Now as you can see, there is no benefit in having Rails AND Node serve the same request - communicating through Redis. Going through the Rails Stack would not provide any benefit if the requests ends up being handled by the Node server.
And even if you only offload some processing to the node server, it's still the Rails webserver that handles the requests and has to wait for a response from node - killing the desired scalability. It simply makes no sense.
Where you would a setup with Node and Rails together is in certain areas of your app that have drastically different scaling requirements.
If you are for example writing a Website that displays live stats for Football games you can easily see that there are two different concerns in your app: The "normal" Site that contains signup, billing and profile stuff that screams for a quick implementation through rails. And the "live" portion of the site where users see live results and you expect to handle a lot of clients at once - all waiting for something to happen (low cpu - high concurrency).
In such a case it may be beneficial to actually seperate the two parts of the site into a Ruby and a Node app, with then sharing data about the user through a store like Redis (but actually you just need some shared state that both can look at and write to for synchronization purposes).
So you would use for example Rails for the Signup/Login portions - once signed up write the session cookie into redis alongside with the permissions of the user (what game is he allowed to follow) and hand the user off to the Node.js app.
There the Node app can read the session information from Redis and serve the user.
Word of advice:
You don't get scalability by simply throwing Node.js into your Toolbox. You really have to find out what Node.js is good at (low-cpu high-io concurrent operations) and how you can leverage that to remedy some of the problems your currently chosen technology has.
I can answer 3 for you. Redis does not guarantee that when you perform an operation that result will actually be on disk, also transaction handling it a bit "different". It also requires for the whole database to be in memory. Depending on the situation this can be an issue or not. It is however incredibly fast. It is not a messaging queue, you can easily make a queue out of it, but it is not it's purpose. If you want to have a queuing system only you can probably do better with something else.

Rails best practice: background process/thread?

I'm coming from a PHP environment (at least in terms of web dev) and into the beautiful world of Ruby, so I may have some dumb questions. I imagine there are some fundamentally different options available when not using PHP.
In PHP, we use memcache to store alerts we want to display in a bar along the top of the page. When something happens that generates an alert (such as a new blog post being made), a cron script that runs once every 5 minutes or so puts that information into memcache.
Now when a user visits the site, we look in memcache to find any alerts that they haven't already dismissed and we display them.
What I'm guessing I can do differently in Rails, is to by-pass the need for a cron script, and also the need to look in memcache on every request, by using a Singleton and a polling process running in a separate thread to copy from memcache to this singleton. This would, in theory, be more optimized than checking memcache once-per-request and also encapsulate the polling logic into one place, rather than being split between a cron task and the lookup logic.
My question is: are there any caveats to having some sort of runloop in the background while a Rails app is running? I understand the implications of multithreading, from Objective-C/Java, but I'm asking specifically about the Rails (3) environment.
Basically something like:
class SiteAlertsMap < Hash
include Singleton
def initialize
super
begin_polling
end
# ... SNIP, any specific methods etc ...
private
def begin_polling
# Create some other Thread here, which polls at set intervals
end
end
This leads me into a similar question. We push (encrypted) tasks onto an SQS queue, for things related to e-commerce and for long-running background tasks. We don't use cron for this, but rather we have a worker daemon written in PHP, which runs in the background. Right now when we deploy, we have to shut down this worker and start it again from the new code-base. In Rails, could I somehow have this process start and stop with the rails server (unicorn) itself? I don't think that's something I'd running on the main process in a separate thread, since we often want to control it as a process by itself, but it would be nice if it just conveniently ran when the web application was running.
Threading for background processes in ruby would be a terrible mistake, especially since you're using a multi-process server. Using unicorn with say 4 worker processes would mean that you'd be polling from each of them, which is not what you want. Ruby doesn't really have real threads, it has green threads in 1.8 and a global interpreter lock in 1.9 IIRC. Many gems and libraries are also obnoxiously unthreadsafe.
Using memcache is still your best option and, if you have it set up correctly, you should only see it adding a millisecond or two to the request time. Another option which would give you the benefit of persisting these alerts while incurring minimal additional overhead would be to store these alerts in redis. This would better protect you against things like memcache crashing or server reboots.
For the background jobs you should use a similar approach to what you have now, but there are several off the shelf handlers for this like resque, delayed_job, and a few others. If you absolutely have to use SQS as the backend queue, you might be able to find some code to help you, but otherwise you could write it yourself. This still requires the other daemon to be rebooted whenever there is a code change. In practice this isn't a huge concern as best practices dictate using a deployment system like capistrano where a rule can easily be added to bounce the daemon on deploy. I use monit to watch the daemon process, so restarting it is as easy as telling monit to restart it.
In general, Ruby is not like Java/Objective-C when it comes to threads. It follows the more Unix-like model of process based isolation, but the community has come up with best practices and ways to make this less painful than in other languages. Ruby does require a bit more attention to setting up its stack as it is not as simple as enabling mod_php and copying some files around, but once the choices and architecture is understood, it is easier to reason about how your application works. The process model, in my opinion, is much better for web apps as it isolates code and state from the effects of other running operations. The isolation also makes the app easier to work with in a distributed system.

How reliable is DRb?

Are there any issues to consider when using DRb for implementing an in-memory message queue and for synchronizing actions between processes? I heard that it could be unreliable, but haven't found anything on the web that confirms this assertion.
In case it's relevant, these would be processes running with a rails app environment that will update models in the database.
DRb is pretty established and widely used. I don't know of anything that would make it unreliable, but I don't use it as a message queue
I'd say you'll have more luck using a message queue as a message queue, instead of rolling your own using DRb. There's a bunch of solutions depending on your needs, memcacheq is pretty easy to interact with, and is in-memory, and is pretty solid.
I personally use DRb running in two separate processes on my web server one to perform minutes-long calculations, allowing the website to poll and check in on the progress, another as a shared captcha server with its own DB connection for various applications on my server. In neither case have I ever had the DRb server fail (except where it was a programming mistake on my part).
Even when the DRb server does fail, you can restart it and the still-running client will reconnect cleanly the next time it needs to communicate.

Resources