Speeding up web service by writing to redis first, disk after? - ruby-on-rails

I have a web service that runs multiple DB queries and takes roughly ~500ms-1,000ms (depending on how much I/O EC2 decides to give me at the given junction if invocation). Users want stuff faster than 1,000ms, and understandably so. What I'm thinking of doing is taking the request parameters, stuffing them into a redis queue without writing to disk, and then running a job in an asynchronous queue which does the disk writes. Does something like this happen normally in practice? am I insane for suggesting it?

So long as your Redis is persisting to disk on regular intervals, this should work. You want to limit the number of scenarios where you might lose data. A sufficiently aggressive persistence schedule for Redis should work for most cases.
Try to give feedback to the user immediately that their action has been received and is being processed. Nothing is more confusing than a slight delay before it appears that might prompt people to attempt the upload again.


My server gets overloaded even though I keep a limit on the requests I send it

I have a server on Heroku - 3 dynos, 2 processes each.
The server does 2 things:
It responds to requests from the browser (AJAX and some web pages), based on data stored in a postgresql database
It exposes a REST API to update the data in the database. This API is called by another server. The rate of calls is limited: The other server only calls my server through a queue with a single worker, which makes sure the other server doesn't issue more than one request in parallel to my server (I verified that indeed it doesn't).
When I look at new relic, I see the following graph, which suggests that even though I keep the other server at one parallel request at most, it still loads my server which creates peaks.
I'd expect that since the rate of calls from the other server is limited, my server will not get overloaded, since a request will only start when the previous request ended (I'm guessing that maybe the database gets overloaded if it gets an update request and returns but continue processing after that).
What can explain this behaviour?
Where else can I look at in order to understand what's going on?
Is there a way to avoid this behaviour?
There are whole lot of directions this investigation could go, but from your screenshot and some inferences, I have two guesses.
A long query—You'd see this graph if your other server or a browser occasionally hits a slow query. If it's just a long read query and your DB isn't hitting its limits, it should only affect the process running the query, but if the query is taking an exclusive lock, all dynos will have to wait on it. Since the spikes are so regular, first think of anything you have running on a schedule - if the cadence matches, you probably have your culprit. The next simple thing to do is run heroku pg:long-running-queries and heroku pg:seq-scans. The former shows queries that might need optimization, and the latter shows full table scans you can probably fix with a different query or a better index. You can find similar information in NewRelic's Database tab, which has time and throughput graphs you can try to match agains your queueing spikes. Finally, look at NewRelic's Transactions tab.
There are various ways to sort - slowest average response time is probably going to help, but check out all the options and see if any transactions stand out.
Click on a suspicious transaction and look at the graph on the right. If you see spikes matching your queueing buildups, that could be it, but since it looks to be affecting your whole site, watch out for several transactions seeing correlated slowdowns.
Check out the transaction traces at the bottom. Something in there taking a long time to run is as close to a smoking gun as you'll get. This should correlate with pg:long-running-queries.
Look at the breakdown table between the graph and the transaction traces. Check for things that are taking a long time (eg. a 2 second external request) or happening often (eg, a partial that gets rendered 2500 times per request). Those are places for caching or optimization.
Garbage collection—This is less likely because Ruby GCs all the time and there's no reason it would show spikes on that regular cadence, but if there's a regular request that allocates a ton of objects, both building the objects and cleaning them up will take time. It would only affect one dyno at once, and it would be correlated with a long or highly repetitive query in your NewRelic investigation. You can see some stats about this in NewRelic's Ruby VM tab.
Take a look at your dyno and DB memory usage too. Both are printed to the Heroku logs, and if you add Librato, they'll build some automatic graphs that are quite helpful. If your dyno is swapping, performance will suffer and you should either upgrade to a bigger dyno or run fewer processes per dyno. Processes will typically accumulate memory as they run and never quite release as much as you'd like, so tune it so that right before a restart, your dyno is just under its available RAM. Similarly for the DB, if you're hitting swap there, query performance will suffer and you should upgrade.
Other things it could be, but probably isn't in this case:
Sleeping dynos—Heroku puts a dyno to sleep if it hasn't served a request in a while, but only if you have just 1 dyno running. You have 3, so this isn't it.
Web Server Concurrency—If at any given moment, there are more requests than available processes, requests will be queued. The obvious fix is to increase the available dynos/processes, which will put more load on your DB and potentially move the issue there. Since some regular request is visible every time, I'm guessing request volume is low and this also isn't your problem.
Heroku Instability—Sometimes, for no obvious reason, Heroku starts queueing requests more than it should and doesn't report any issues at status.heroku.com. Restarting the dynos typically fixes that temporarily while Heroku gets their head back on straight.

Delayed Processing Jobs in Ruby: How much is not blocking my path

I have this project which still uses delayed job as processing job queue. I've recently found an edge case which is making me question a few things: I have this AR (I'm using MySQL, by the way) object, which on update sends a message to all the elements of an has_many association. In order to do that, I have to instantiate all the elements of this association an call the message on them. It seemed only fair enough to delay the call of this message for each one of them.
Now the association has grown quite a bit, where in an edge case I have 40000 objects belonging to that association. The message sending thereby now involves the (synchronous) creation of 40000 delayed-job jobs. Since these happen inside an after update callback an not after commit, they are thereby (ab)using the same connection, not taking advantage of any context-switching. Short version, I have a pipe of 1 Update statement and 40000 Inserts on the same connection. This update is gobbling quite a few minutes in production, for that reason.
Now, there are a lot of ways around this: Change the callback to an after commit, creating 1 (synchronous) delayed job which will create 40000 jobs (I don't want to handle the 40000 (AR) objects in one job, the 40000 now will be 120000 tomorrow, and that's memory-armageddon), etc etc...
But what I'm really considering is switching my delayed processing queue to resque or sidekiq. They use redis, so write performance is far better. They use something rather than MySQL, which means the connections will not block each other. My only issue is: how much would 40000 writes at once to redis cost me? And: does any one of these options first store the jobs in memory, not blocking the response to the client and belatedly stores them in redis? So, my real question is: how much would this delaying delay me in such an edge case?
Indeed, Redis can process writes faster than MySQL. Try running redis-benchmark, you'll see figures of 100k+ writes/sec.
does any one of these options first store the jobs in memory, not blocking the response to the client and belatedly stores them in redis?
No, they do it synchronously.
I don't want to handle the 40000 (AR) objects in one job
Maybe you should try hybrid approach: process chunks of N objects per job. Batch writes should be faster than 40k individual writes. And it scales well (batch size will stay the same, be it 40k or 400k items).

Is it a good idea to use MQ to store data in DB?

I'm going to use rabbitMQ as a message broker and switch most of the scripts to sending data to queue instead of performing direct writes/reads. Consumer will get those messages and perform corresponding operations. In my dreams this will give me more flexibility choosing DB engine, app level sharding and so on. But is it a good idea generally? Or am I missing something? Current write load is ~15k inserts/deletes for mysql and 30-50k sets for redis instances. Read load is the same ~15-20k selects, and 50-70k gets for redis.
The biggest issue you'll face will be the fact that your DB writes will be asynchronously processed. If a client writes data to the DB and then instantly reads it back, the value might not be what it originally inserted because the Rabbit queue might have been very busy or slow, delaying the update operation. Or an admin might accidentally purge your queue and then you'll have all these clients thinking their transactions had been committed but nothing will have been stored.
This sounds like a classic case of premature optimization. It's a solution in search of a problem, and you should probably avoid doing it.
With amqp you can run a none asynchronous operations using a RPC way, with this kind of architecture you should figure out all problems related with asynchronous operations.

How to guarantee data integrity for concurrent Rails/Active Record operations

I need to implement a feature for a rails site that will involve reading and exporting most of my database.
I know this operation is going to take a while. That's fine-- I've got delayed job for that.
What I'm worried about is the data changing during the running of the job, and the resulting export being corrupted because of that.
My initial thought was to do all of the reads within a transaction. However, I would also like to be running the reads concurrently, if possible. ActiveRecord docs say that Transactions cannot be shared between Connections, and Connections cannot be shared between Threads. So it looks as though I am restricted to a single thread with this approach.
Any suggestions for a workaround? Is there another way to give the job a consistent view of the data that doesn't involve transactions? Or is there some alternative to ActiveRecord/Mysql out there that can distribute transactions across threads?

Executing large numbers of asynchronous IO-bound operations in Rails

I'm working on a Rails application that periodically needs to perform large numbers of IO-bound operations. These operations can be performed asynchronously. For example, once per day, for each user, the system needs to query Salesforce.com to fetch the user's current list of accounts (companies) that he's tracking. This results in huge numbers (potentially > 100k) of small queries.
Our current approach is to use ActiveMQ with ActiveMessaging. Each of our users is pushed onto a queue as a different message. Then, the consumer pulls the user off the queue, queries Salesforce.com, and processes the results. But this approach gives us horrible performance. Within a single poller process, we can only process a single user at a time. So, the Salesforce.com queries become serialized. Unless we run literally hundreds of poller processes, we can't come anywhere close to saturating the server running poller.
We're looking at EventMachine as an alternative. It has the advantage of allowing us to kickoff large numbers of Salesforce.com queries concurrently within a single EventMachine process. So, we get great parallelism and utilization of our server.
But there are two problems with EventMachine. 1) We lose the reliable message delivery we had with ActiveMQ/ActiveMessaging. 2) We can't easily restart our EventMachine's periodically to lessen the impact of memory growth. For example, with ActiveMessaging, we have a cron job that restarts the poller once per day, and this can be done without worrying about losing any messages. But with EventMachine, if we restart the process, we could literally lose hundreds of messages that were in progress. The only way I can see around this is to build a persistance/reliable delivery layer on top of EventMachine.
Does anyone have a better approach? What's the best way to reliably execute large numbers of asynchronous IO-bound operations?
I maintain ActiveMessaging, and have been thinking about the issues of a multi-threaded poller also, though not perhaps at the same scale you guys are. I'll give you my thoughts here, but am also happy to discuss further o the active messaging list, or via email if you like.
One trick is that the poller is not the only serialized part of this. STOMP subscriptions, if you do client -> ack in order to prevent losing messages on interrupt, will only get sent a new message on a given connection when the prior message has been ack'd. Basically, you can only have one message being worked on at a time per connection.
So to keep using a broker, the trick will be to have many broker connections/subscriptions open at once. The current poller is pretty heavy for this, as it loads up a whole rails env per poller, and one poller is one connection. But there is nothing magical about the current poller, I could imagine writing a poller as an event machine client that is implemented to create new connections to the broker and get many messages at once.
In my own experiments lately, I have been thinking about using Ruby Enterprise Edition and having a master thread that forks many poller worker threads so as to get the benefit of the reduced memory footprint (much like passenger does), but I think the EM trick could work as well.
I am also an admirer of the Resque project, though I do not know that it would be any better at scaling to many workers - I think the workers might be lighter weight.
I've used AMQP with RabbitMQ in a way that would work for you. Since ActiveMQ implements AMQP, I imagine you can use it in a similar way. I have not used ActiveMessaging, which although it seems like an awesome package, I suspect may not be appropriate for this use case.
Here's how you could do it, using AMQP:
Have Rails process send a message saying "get info for user i".
The consumer pulls this off the message queue, making sure to specify that the message requires an 'ack' to be permanently removed from the queue. This means that if the message is not acknowledged as processed, it is returned to the queue for another worker eventually.
The worker then spins off the message into the thousands of small requests to SalesForce.
When all of these requests have successfully returned, another callback should be fired to ack the original message and return a "summary message" that has all the info germane to the original request. The key is using a message queue that lets you acknowledge successful processing of a given message, and making sure to do so only when relevant processing is complete.
Another worker pulls that message off the queue and performs whatever synchronous work is appropriate. Since all the latency-inducing bits have already performed, I imagine this should be fine.
If you're using (C)Ruby, try to never combine synchronous and asynchronous stuff in a single process. A process should either do everything via Eventmachine, with no code blocking, or only talk to an Eventmachine process via a message queue.
Also, writing asynchronous code is incredibly useful, but also difficult to write, difficult to test, and bug-prone. Be careful. Investigate using another language or tool if appropriate.
also checkout "cramp" and "beanstalk"
Someone sent me the following link: http://github.com/mperham/evented/tree/master/qanat/. This is a system that's somewhat similar to ActiveMessaging except that it is built on top of EventMachine. It's almost exactly what we need. The only problem is that it seems to only work with Amazon's queue, not ActiveMQ.
