Ruby on Rails on few servers - ruby-on-rails

I have a big application. One of the part of this is highload processing with user files. I decide to provide for this one dedicate server. There will be nginx for distribution content and some programs (non rails) for processing files.
I have two question:
What better to use on this server? (Rails or something else, maybe Sinatra)
If I'll use Rails how to deploy? I can't find any instruction. If I have one app and two servers how to deploy it and delegate task for each other?
ps I need to authorize user on both servers. In Rails I use Devise.

You can use Rails for this. If both servers will act as a web client to the end user then you'll need some sort of load balancer in front of the two servers. HAProxy does a great job on this.
As far as getting the two applications to communicate with each other, this will be less trivial than you may think. What you should do is use a locking mechanism on performing the tasks. Delayed_job by default will lock a job in the queue so that any other works will not try and work on the same job. You can use callbacks from ActiveJob to notify the user via web sockets whenever their job is completed.
Anything that will take time or calling an external API should usually be placed into a background processing queue so that you're not holding up the user.
If you cannot spin up more than the two servers, you should make one of them the master or at least have some clear roles of the two servers. For example, one server may be your background processing and memcache server while the other is storing your database and handles your web sockets.
There are a lot of different ways of configuring the services and anything including and beyond what I've mentioned is opinionated.
Having separate servers for handling tasks is my preference as it makes them easier to manage from a Sys Admin perspective. For example, if we find that our web sockets server is hammered, we can simply spin up a few more web socket servers and throw them into a load balancer pool. The end user would not be negatively impacted from your networking changes. Whereas, if you have your servers performing dual roles outside of your standard Rails installation, you may find yourself cloning and wasting resources. Each of my web servers usually also perform background tasks on low-intermediate priority queues while a dedicated server is left for handling mission critical jobs.

Related

What is worker , dyno and zero-downtime deploys in heroku

These three terms have given a lot of importance in understanding different app server in heroku tutorials but I can't understand the purpose and definition of these three terms.
Can anybody have info about that. Kindly share
Thanks
The Heroku reference guide has a lot of information on all of this, and lots more, but in answer to your question;
A dyno is effectively a small virtual server instance set up to run one app (it's behind an invisible load balancer, so you can have any number of them running side-by-side). You don't need to worry about the server admin side of things, as it just takes your source code from a Git push and runs it.
A worker is a type of dyno, usually designed to process tasks in the background (in contrast to a web dyno, which just serves web pages. For example, Rails has ActiveJob, which plugs into something like Resque or Sidekiq, completes tasks which would slow down the web interface if it has to complete them, like sending e-mails, or geocoding addresses.
Zero-downtime deploys is really marketing speak for "if you push your code, it will wait until the new version is up and running before swapping the web interface to use it". It means you don't need to do anything, and your web app won't go offline while it is switched over to.

RabbitMQ with EventMachine and Rails

we are currently planning a rails 3.2.2 application where we use RabbitMQ. We would like to run several kind of workers (and several instances of a worker) to process messages from different queues. The workers are written in ruby and are laying in the lib directory of the rails app.
Some of the workers needs the rails framework (active record, active model...) and some of them don't. The first worker should be called every minute to check if updates are available. The other workers should process the messages from their queues when messages (which are send by the first worker) are present and do some (time consuming) stuff with it.
So far, so good. My problem is, that I only have little experiences with messaging systems like RabbitMQ and no experiences with the rails interaction between them. So I'm wondering what the best practices are to get the two playing with each other. Here are my requirements again:
Rails 3.2.2 app
RabbitMQ
Several kind of workers
Several instances of one worker
Control the amount of workers out of rails
Workers are doing time consuming tasks, so they have to be async
Only a few workers needs the rails framework. The others are just ruby files with some dependencies like Net or File
I was looking for some solution and came up with two possibilities:
Using amqp with EventMachine in a new thread
Of course, I don't want my rails app to be blocked when a new worker is created. The worker should run in another thread and do its work asynchronously. And furthermore, it should not start a new instance of my rails application. It should only require the things the worker needs.
But in some articles they say that there are some issues with Passenger. And another fact that I don't like is, that we are using webbrick for development and we ought to include workarounds for that too. It would be possible to switch to another webserver like thin, but I don't have any experience with that either.
Using some kind of daemonizing
Maybe its possible to run workers as a daemon, but I don't know how much overhead this would come up with, or how I can control the amount of workers.
Hope someone can advise a good solution for that (and I hope I made myself clear ;)
It seems to me that AMQP is a big shot to kill your problem. Have you tried to use Resque? The backed Redis database has some neat features (like publish/subscribe and blocking list pop) which make it very interesting as a message queue, and Resque is very easy to use in any Rails app.
The workers are daemonized, and you decide which worker of your pool listens to which queue, so you can scale each type of job as needed.
Using EM reactor inside a request/response cycle is not recommended, because it may conflict with an existing event loop (for instance if your app is served by thin), in any case you have to configure it specifically for your web server, OTOS it may be interesting to have an evented queue consumer, if your jobs have blocking IO and are not processor-bound.
If you still want to do it with AMQP, see Starting the event loop and connecting in Web applications and configure for your web server accordingly. Or use bunny to push synchronously in the queue (and whichever job consumer you deam useflu, like workling for instance)
we are running slightly different -- but similar technology stack.
daemon kit is used for eventmachine side of the system... no rails, but shared models (mongomapper & mongodb). EM is pulling messages off the queues, and doing whatever logic is required (we have ruleby in the mix, but if-then-else works too).
mulesoft ESB is our outward-facing message receiver and sender that helps us deal with the HL7/MLLP world. But in v1 of the app, we used some java code in ActiveMQ to manage HL7 messages.
the rails app then just serves up stuff for the user to see -- again, using the shared models.

Blocking IO / Ruby on Rails

I'm contemplating writing a web application with Rails. Each request made by the user will depend on an external API being called. This external API can randomly be very slow (2-3 seconds), and so obviously this would impact an individual request.
During this time when the code is waiting for the external API to return, will further user requests be blocked?
Just for further clarification as there seems to be some confusion, this is the model I'm anticipating:
Alice makes request to my web app. To fulfill this, a call to API server A is made. API server A is slow and takes 3 seconds to complete.
During this wait time when the Rails app is calling API server A, Bob makes a request which has to make a request to API server B.
Is the Ruby (1.9.3) interpreter (or something in the Rails 3.x framework) going to block Bob's request, requiring him to wait until Alice's request is done?
If you only use one single-threaded, non-evented server (or don't use evented I/O with an evented server), yes. Among other solutions using Thin and EM-Synchrony will avoid this.
Elaborating, based on your update:
No, neither Ruby nor Rails is going to cause your app to block. You left out the part that will, though: the web server. You either need multiple processes, multiple threads, or an evented server coupled with doing your web service requests with an evented I/O library.
#alexd described using multiple processes. I, personally, favor an evented server because I don't need to know/guess ahead of time how many concurrent requests I might have (or use something that spins up processes based on load.) A single nginx process fronting a single thin process can server tons of parallel requests.
The answer to your question depends on the server your Rails application is running on. What are you using right now? Thin? Unicorn? Apache+Passenger?
I wholeheartedly recommend Unicorn for your situation -- it makes it very easy to run multiple server processes in parallel, and you can configure the number of parallel processes simply by changing a number in a configuration file. While one Unicorn worker is handling Alice's high-latency request, another Unicorn worker can be using your free CPU cycles to handle Bob's request.
Most likely, yes. There are ways around this, obviously, but none of them are easy.
The better question is, why do you need to hit the external API on every request? Why not implement a cache layer between your Rails app and the external API and use that for the majority of requests?
This way, with some custom logic for expiring the cache, you'll have a snappy Rails app and still be able to leverage the external API service.

Rails: Can I run backgrounds jobs in a different server?

Is it possible to host the application in one server and queue jobs in another server?
Possible examples:
Two different EC2 instances, one with the main server and the second with the queueing service.
Host the app in Heroku and use an EC2 instance with the queueing service
Is that possible?
Thanks
Yes, definitely. We have delayed_job set up that way where I work.
There are a couple of requirements for it to work:
The servers have to have synced clocks. This is usually not a problem as long as the server timezones are all set to the same.
The servers all have to access the same database.
To do it, you simply have the same application on both (or all, if more than two) servers, and start workers on whichever server you want to process jobs. Either server can still queue jobs, but only the one(s) with workers running will actually process them.
For example, we have one interface server, a db server and several worker servers. The interface server serves the application via Apache/Passenger, connecting the Rails application to the db server. The workers have the same application, though Apache isn't running and you can't access the application through http. They do, on the other hand, have delayed_jobs workers running. In a common scenario, the interface server queues up jobs in the db, and the worker servers process them.
One word of caution: If you're relying on physical files in your application (attachments, log files, downloaded XML or anything else), you'll most likely need a solution like S3 to keep those files. The reason for this is that the individual servers might not have the actual files. An example of this is if your user were to upload their profile picture on your web-facing server, the files would likely be stored on that server. If you then have another server to resize the profile pictures, the image wouldn't exist on the worker server.
Just to offer another viable option: you can use a new worker service like IronWorker that relies completely on an elastic farm of cloud servers inside EC2.
This way you can queue/schedule jobs to run and they will parallelize across tons of threads spanning multiple servers - all without worrying about the infrastructure.
Same deal with the database though - it needs to be accessible from the outside.
Full disclosure: I helped build IW.

Best way to run rails with long delays

I'm writing a Rails web service that interacts with various pieces of hardware scattered throughout the country.
When a call is made to the web service, the Rails app then attempts to contact the appropriate piece of hardware, get the needed information, and reply to the web client. The time between the client's call and the reply may be up to 10 seconds, depending upon lots of factors.
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
I basically see two options. Either run JRuby and use multithreading or else run several regular Ruby instances and hope that not many people try to use the service at a time. JRuby seems like the much better solution, but it still doesn't seem to be mainstream and have out of the box support at Heroku and EngineYard. The multiple instance solution seems like a total kludge.
1) Am I right about my two options? Is there a better one I'm missing?
2) Is there an easy deployment option for JRuby?
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
From an engineering perspective, this seems like it would be the best alternative.
Why don't you want to do it?
There's a third option: If you host your Rails app with Passenger and enable global queueing, you can do this transparently. I have some actions that take several minutes, with no issues (caveat: some browsers may time out, but that may not be a concern for you).
If you're worried about browser timeout, or you cannot control the deployment environment, you may want to process it in the background:
User requests data
You enter request into a queue
Your web service returns a "ticket" identifier to check the progress
A background process processes the jobs in the queue
The user polls back, referencing the "ticket" id
As far as hosting in JRuby, I've deployed a couple of small internal applications using the glassfish gem, but I'm not sure how much I would trust it for customer-facing apps. Just make sure you run config.threadsafe! in production.rb. I've heard good things about Trinidad, too.
You can also run the web service call in a delayed background job so that it's not hogging up a web-server and can even be run on a separate physical box. This is also a much more scaleable approach. If you make the web call using AJAX then you can ping the server every second or two to see if your results are ready, that way your client is not held in limbo while the results are being calculated and the request does not time out.

Resources