Communication from rails with endless worker - ruby-on-rails

I'm implementing worker for rails+sneakers that will work for a long time(days, months). It should be websocket communication with external service.
The first concern: is it correct to run the endless process via ruby+sneakers? How can I monitor that there are no memory leaks?
Second question: What is the best way to send new commands to that worker?

Related

Using ranch, how can I tell when a listener is terminated?

I'm writing a TCP server in Erlang, using Ranch. The clients will reconnect immediately the connection is dropped, which means that one particular failure mode is listeners being started and killed dozens of times a second.
I'd like to detect this happening and publish statistics to statsd, for monitoring in production.
So, can I use something in Ranch to monitor when a listener is recycled? Or can I use something in Erlang to monitor process mortality for the entire node, without having to link to each process, and where those processes were started by some other supervisor, so I don't have a reference to them?
It's not a direct answer to my question, but I opted, instead, to have a separate process periodically polling ranch_server:count_connections(my_ref), and publish that to statsd.

Using Puma and Sidekiq in a backend Rails app

I have a backend Rails server with Sidekiq, which serves as API server. The app works as follow:
My Rails server receives many requests from incoming API clients at the same time.
For each of these requests, the Rails server will allocate jobs to a Sidekiq server. Sidekiq server makes requests to external APIs (such as Facebook) to get data, and analyze it and return a result to Rails server.
For example, if I receive 10 incoming requests from my API clients, for each request, I need to make 10 requests to external API servers, get data and process it.
My challenge is to make my app responds to incoming requests concurrently. That is, for each incoming request, my app should process in parallel: make calls to external APIs, get data and return result.
Now, I know that Puma can add concurrency to Rails app, while Sidekiq is multi-threaded.
My question is: Do I really need Sidekiq if I already have Puma? What would be the benefit of using both Puma and Sidekiq?
In particular, with Puma, I just invoke my external API calls, data processing etc. from my Rails app, and they will automatically be concurrent.
Yes, you probably do want to use Puma and Sidekiq. There are really two issues at play here.
Concurrency (as it seems you already know) is the number of web requests that can be handled simultaneously. Using an app server like Puma or Unicorn will definitely help you get better concurrency than the default web brick server.
The other issue at play is the length of time that it takes your server to process a web request.
The reason that these two things are related is that number or requests per second that your app can process is a function of both the average processing time for each request and the number of worker processes that are accepting requests. Say your average response time is 100ms. Then a single web worker can process 10 requests per second. If you have 5 workers, then you can handle 50 requests per second. If your average response time is 500ms, then you can handle 2 reqs/sec with a single worker, and 10 reqs/sec with 5 workers.
Interacting with external APIs can be slow at times, and in the worst cases it can be very unreliable with unresponsive servers on the remote end, or network outages or slowdowns. Sidekiq is a great way to insulate your application (and your end users) from the possibility that the remote API is responding slowly. Imagine that the remote API is running slowly for some reason and that the average response time from it has slowed down to 2 seconds per request. In that case you'd only be able to handle 2.5 reqs/sec with 5 workers. With anymore traffic than that your end users might start to have a long wait time before any page on your app could respond, even those that don't make remote API calls, because all of your web workers might be waiting for the slow remote API to respond. As traffic continues to increase your users would start getting connection timeouts.
The idea with using Sidekiq is that you separate the time spent waiting on the external API from your web workers. You'd basically take the request for data from your user, pass it to Sidekiq, and then immediately return a response to the user that basically says "we're processing your request". Sidekiq can then pick up the job and make the external request. After it has the data it can save that data back into your application. Then you can use web sockets to push a notification to the user that the data is ready. Or even push the data directly to them and update the page accordingly. (You could also use polling to have the page continually asking "is it ready yet?", but that gets very inefficient very quickly.)
I hope this makes sense. Let me know if you have any questions.
Sidekiq, like Resque and Delayed Job, is designed to provide asynchronous job processing from a queue.
If you don't need jobs to be queued up and run asynchronously, there's no substantial benefit (or harm) to using Sidekiq.
If the tasks need to run synchronously (which it sounds like you might—it's not clear if clients are waiting for data or just requesting that jobs run), Sidekiq and its relatives are likely the wrong tool for the job. There is no guaranteed processing time when using Sidekiq or other solutions; jobs are pushed onto the end of the stack, however long that may be, and won't be processed until their turn comes up. If clients are waiting for data, they may time out long before your worker pool ever processes their jobs.

RabbitMQ with EventMachine and Rails

we are currently planning a rails 3.2.2 application where we use RabbitMQ. We would like to run several kind of workers (and several instances of a worker) to process messages from different queues. The workers are written in ruby and are laying in the lib directory of the rails app.
Some of the workers needs the rails framework (active record, active model...) and some of them don't. The first worker should be called every minute to check if updates are available. The other workers should process the messages from their queues when messages (which are send by the first worker) are present and do some (time consuming) stuff with it.
So far, so good. My problem is, that I only have little experiences with messaging systems like RabbitMQ and no experiences with the rails interaction between them. So I'm wondering what the best practices are to get the two playing with each other. Here are my requirements again:
Rails 3.2.2 app
RabbitMQ
Several kind of workers
Several instances of one worker
Control the amount of workers out of rails
Workers are doing time consuming tasks, so they have to be async
Only a few workers needs the rails framework. The others are just ruby files with some dependencies like Net or File
I was looking for some solution and came up with two possibilities:
Using amqp with EventMachine in a new thread
Of course, I don't want my rails app to be blocked when a new worker is created. The worker should run in another thread and do its work asynchronously. And furthermore, it should not start a new instance of my rails application. It should only require the things the worker needs.
But in some articles they say that there are some issues with Passenger. And another fact that I don't like is, that we are using webbrick for development and we ought to include workarounds for that too. It would be possible to switch to another webserver like thin, but I don't have any experience with that either.
Using some kind of daemonizing
Maybe its possible to run workers as a daemon, but I don't know how much overhead this would come up with, or how I can control the amount of workers.
Hope someone can advise a good solution for that (and I hope I made myself clear ;)
It seems to me that AMQP is a big shot to kill your problem. Have you tried to use Resque? The backed Redis database has some neat features (like publish/subscribe and blocking list pop) which make it very interesting as a message queue, and Resque is very easy to use in any Rails app.
The workers are daemonized, and you decide which worker of your pool listens to which queue, so you can scale each type of job as needed.
Using EM reactor inside a request/response cycle is not recommended, because it may conflict with an existing event loop (for instance if your app is served by thin), in any case you have to configure it specifically for your web server, OTOS it may be interesting to have an evented queue consumer, if your jobs have blocking IO and are not processor-bound.
If you still want to do it with AMQP, see Starting the event loop and connecting in Web applications and configure for your web server accordingly. Or use bunny to push synchronously in the queue (and whichever job consumer you deam useflu, like workling for instance)
we are running slightly different -- but similar technology stack.
daemon kit is used for eventmachine side of the system... no rails, but shared models (mongomapper & mongodb). EM is pulling messages off the queues, and doing whatever logic is required (we have ruleby in the mix, but if-then-else works too).
mulesoft ESB is our outward-facing message receiver and sender that helps us deal with the HL7/MLLP world. But in v1 of the app, we used some java code in ActiveMQ to manage HL7 messages.
the rails app then just serves up stuff for the user to see -- again, using the shared models.

Blocking IO / Ruby on Rails

I'm contemplating writing a web application with Rails. Each request made by the user will depend on an external API being called. This external API can randomly be very slow (2-3 seconds), and so obviously this would impact an individual request.
During this time when the code is waiting for the external API to return, will further user requests be blocked?
Just for further clarification as there seems to be some confusion, this is the model I'm anticipating:
Alice makes request to my web app. To fulfill this, a call to API server A is made. API server A is slow and takes 3 seconds to complete.
During this wait time when the Rails app is calling API server A, Bob makes a request which has to make a request to API server B.
Is the Ruby (1.9.3) interpreter (or something in the Rails 3.x framework) going to block Bob's request, requiring him to wait until Alice's request is done?
If you only use one single-threaded, non-evented server (or don't use evented I/O with an evented server), yes. Among other solutions using Thin and EM-Synchrony will avoid this.
Elaborating, based on your update:
No, neither Ruby nor Rails is going to cause your app to block. You left out the part that will, though: the web server. You either need multiple processes, multiple threads, or an evented server coupled with doing your web service requests with an evented I/O library.
#alexd described using multiple processes. I, personally, favor an evented server because I don't need to know/guess ahead of time how many concurrent requests I might have (or use something that spins up processes based on load.) A single nginx process fronting a single thin process can server tons of parallel requests.
The answer to your question depends on the server your Rails application is running on. What are you using right now? Thin? Unicorn? Apache+Passenger?
I wholeheartedly recommend Unicorn for your situation -- it makes it very easy to run multiple server processes in parallel, and you can configure the number of parallel processes simply by changing a number in a configuration file. While one Unicorn worker is handling Alice's high-latency request, another Unicorn worker can be using your free CPU cycles to handle Bob's request.
Most likely, yes. There are ways around this, obviously, but none of them are easy.
The better question is, why do you need to hit the external API on every request? Why not implement a cache layer between your Rails app and the external API and use that for the majority of requests?
This way, with some custom logic for expiring the cache, you'll have a snappy Rails app and still be able to leverage the external API service.

mochiweb and gen_server

[This will only make sense if you've seen Kevin Smith's 'Erlang in Practice' screencasts]
I'm an Erlang noob trying to build a simple Erlang/OTP system with embedded webserver [mochiweb].
I've walked through the EIP screencasts, and I've toyed with simple mochiweb examples created using the new_mochiweb.erl script.
I'm trying to figure out how the webserver should relate to the gen_server modules. In the EIP examples [Ch7], the author creates a web_server.erl gen_server process and links the mochiweb_http process to it. However in a mochiweb project, the mochiweb_http process seems to be 'standalone'; it doesn't seem to be embedded in a separate gen_server process.
My question is, should one of these patterns be preferred over the other ? If so, why ? Or doesn't it matter ?
Thanks in advance.
You link processes to the supervisor hierarchy of your application for two reasons: 1) to be able to restart your worker processes if they crash, and 2) to be able to kill all your processes when you stop the application.
As the previous answer says, 1) is not the case for http requests handling processes. However, 2) is valid: if you let your processes alone, you can't guarantee that all your processes will be cleared from the VM after stopping your application (think of processes stuck in endless loops, waiting in receives, etc...).
The reason to embed a process in a supervision tree is so that you can restart it if it fails.
A process that handles an HTTP request is responding to an event generated externally - in a browser. It is not possible to restart it - that is the prerogative of the person running the browser - therefore it is not necessary to run it under OTP - you can just spawn it without supervision.

Resources