How do you handle DB deadlocks in MVC apps? - ruby-on-rails

I have a Ruby on Rails application with models, controllers and stuff, so user requests are performed as independent ones. The app is backed by an MS SQL database. From time to time (at ~100 rpm) concurrent user requests cause deadlocks on DB resources, so one of the requests fails with an error.
What is the right way to handle such situations and avoid deadlocks? I'm looking for a general direction to dig. Thanks.

This blog post has some very good advice for preventing deadlocks in Rails and dealing with them.
There is also a drop-in deadlock_retry gem written by, among others, DHH, which will re-try transactions 3 times when a deadlock timeout is detected.

Related

Do threads in rails automatically open database connections?

I have a rails app running on Heroku that makes use of threads, and it occasionally runs into database connection errors. Is this just because I am accessing the database within the threads or does each thread automatically open a database connection? I would like to learn more about threading in rails, and any resources are appreciated.
This question varies largely depending on how many instances you have running, how many requests your receiving, and more importantly your database. Databases can and will have a maximum number of concurrent connections. You can read more about Heroku/concurrent connections here in the Heroku official documentary, it's probably more informative than what I can tell you in a single comment.
That being said, your question was a little vague and it's hard to figure out what's going on. Can you tell us a little more about what error you're getting (like the specific error) and maybe a small backtrace? Are you getting these errors on the same pages or different pages? Would you say your site is particularly high traffic?

Recurring job to check if url exists

I want to build a service that notifies me when a url returns status 200. I'm currently using a sidekiq worker, if the status == 200, it updates my database (row.available = true), if not, it raises an exception and retries the worker in n seconds, n amount of times.
Though this works, it doesn't feel efficient or scalable (1000's checks would result in 1000's of exceptions, and on certain platforms that's bad news -- JRuby), and I'm sure there is a way I can build an internal service to manage this url monitoring that doesn't rely on sidekiq (perhaps in Go, or another, more suited Ruby gem). However, I have no idea where to begin, and so I'd appreciate some general direction.
Writing and running a simple link checker is easy. Doing that for 1000s of links quickly, without redundancy, and handling dead and slow-responding links without bogging down your entire system gets harder.
I'd use three threads, plus two queues:
A dispatcher thread that only reads from the database. It is responsible for finding and queuing URLs to be checked in to a "to be checked" queue.
A worker thread that consumes from the first queue and pushes results into the "updated URL results" queue.
An updater/consumer thread that takes the result of a thread in #2 and updates the database.
Ruby has some built-in classes to help:
Thread
Queue
I'd highly recommend Typhoeus and Hydra for use in the middle thread. The documentation for these two classes cover a lot of what you need to do as far as handling multiple threads running in parallel.
I wouldn't write this code as part of a Rails application. There is no value added by Rails to this, nor is it necessary. I would either require Active Record and piggy-back on the existing database.yaml settings and models, or use Rails' "runner" to run the code as an adjunct to the Rails code.
Or, I'd write a small, application-specific, piece of code to run on a different server to avoid bogging down the Rails server. Using something like MySQL or PostgreSQL drivers would let you talk to the same database that Rails uses. In this case I'd use the Sequel gem to act as the ORM, but that's because I prefer it over Active Record.
There are a lot of things to consider as you write this code, including retries of failed URLs, sensing redirections and updating the source URLs to reflect them to avoid wasting time, and not beating up the hosting servers causing you to be banned.
I've written several apps for this purpose over the years and doing it right takes a lot of forethought, so think out your design up front otherwise you could end up with some major rewrites later on.

Create multiple Rails servers sharing same database

I have a Rails app hosted on Heroku. I have to do long backend calculations and queries against a mySQL database.
My understanding is that using DelayedJob or Whenever gems to invoke backend processes will still have impact on Rails (front-end) server performance. Therefore, I would like to set up two different Rails servers.
The first server is for front-end (responding to users' requests) as in a regular Rails app.
The second server (also a Rails server) is for back-end queries and calculation only. It will only read from mySQL, do calculation then write results into anothers Redis server.
My sense is that not lot of Rails developers do this. They prefer running background jobs on a Rails server and adding more workers as needed. Is my sever structure a good design, or is it an overkill? Is there any pitfall I should be aware of?
Thank you.
I don't see any reason why a background job like DelayedJob would cause any more overhead on your main application than another server would. The DelayedJob runs in it's own process so the dyno's for your main app aren't affected. The only impact could be on the database queries but that will be the same whether from a background job or another app altogether that is accessing the same database.
I would recommend using DelayedJob and workers on your primary app. It keeps things simple and shouldn't be any worse performance wise.
One other thing to consider if you are really worried about performance is to have a database "follower", this is effectively a second database that keeps itself up to date with your primary database but can only be used for reads (not writes). There may be better documentation about it, but you can get the idea here https://devcenter.heroku.com/articles/fast-database-changeovers#create_a_follower. You could then have these lengthy background jobs read data from here leaving your main database completely unaffected.

How is request processing with rails, redis, and node.js asynchronous?

For web development I'd like to mix rails and node.js since I want to get the best out of both worlds (rails for fast web development and node for concurrency). I know that some people choose to just use full ruby stack with eventmachine that is integrated into rails controller so that every request can be nonblocking by using fiber in event-loop model. I have been able to understand how that works in a big picture.
At this moement however I want to try doing nonblocking request processing with rails and node.js with message queue concept. I heard that this can be achieved by using redis as an intermediary. I'm still having trouble trying to figure out how that works as of now. From what I can understand: so we have 2 apps A (rails) and B (node.js) and redis. rails app will handle requests from users that go through controllers in REST manner, and then from there rails will pass that through redis, and then redis will form queues and node.js app will pick up that queue and do whatever necessary afterhand (write or read from backend db).
My questions:
So how would that improve concurrency and scalability? from what i
know since rails handle the requests through controllers
synchronously, and then write to redis, the requests will be
blocking still, even though node.js end can pickup the queue
asynchronously. (I have a feeling that it's not asynchronous yet if it's not end to end
non-blocking).
Would node.js be considered a proxy or an application here if redis
is the intermediary?
I'm new to redis and learning it still. If I'm using 100% noSQL
solution for my backend database, such as mongoDB or couchDB, are they replaceable by redis entirely or is redis more seen as a
messaging queue tool like rabbitMQ?
Is messaging queue a different concurrency concept than threading or
event-loop model or is it supposed to supplement them?
That's all my question. I'm new to message queue concept. Will appreciate any help and pointers to right direction and articles that help me learn more. thanks.
You are mixing some things here that don't go together.
Let's first make sure we are on the same page regarding the strengths/weaknesses of the involved technologies
Rails: Used for it's web-development simplicity and perfect for serving database-backed web-applications.
Not very performant when having to serve a large number of long running requests as you'd run out of threads on your Ruby workers - but well suited for anything that can scale horizontally with more web-nodes (multiple web-servers - 1 db).
Node.js: Great for high-concurrency scenarios. Not as easy as rails to write a regular web-application in it. But can handle near an insane amount of long-running low-cpu tasks efficiently.
Redis: A Key-Value Store that supports operations on it's data-structures (increment/decrement values, append/prepent push/pop to lists - all operations that make this DB work consistently with multiple clients writing at once)
Now as you can see, there is no benefit in having Rails AND Node serve the same request - communicating through Redis. Going through the Rails Stack would not provide any benefit if the requests ends up being handled by the Node server.
And even if you only offload some processing to the node server, it's still the Rails webserver that handles the requests and has to wait for a response from node - killing the desired scalability. It simply makes no sense.
Where you would a setup with Node and Rails together is in certain areas of your app that have drastically different scaling requirements.
If you are for example writing a Website that displays live stats for Football games you can easily see that there are two different concerns in your app: The "normal" Site that contains signup, billing and profile stuff that screams for a quick implementation through rails. And the "live" portion of the site where users see live results and you expect to handle a lot of clients at once - all waiting for something to happen (low cpu - high concurrency).
In such a case it may be beneficial to actually seperate the two parts of the site into a Ruby and a Node app, with then sharing data about the user through a store like Redis (but actually you just need some shared state that both can look at and write to for synchronization purposes).
So you would use for example Rails for the Signup/Login portions - once signed up write the session cookie into redis alongside with the permissions of the user (what game is he allowed to follow) and hand the user off to the Node.js app.
There the Node app can read the session information from Redis and serve the user.
Word of advice:
You don't get scalability by simply throwing Node.js into your Toolbox. You really have to find out what Node.js is good at (low-cpu high-io concurrent operations) and how you can leverage that to remedy some of the problems your currently chosen technology has.
I can answer 3 for you. Redis does not guarantee that when you perform an operation that result will actually be on disk, also transaction handling it a bit "different". It also requires for the whole database to be in memory. Depending on the situation this can be an issue or not. It is however incredibly fast. It is not a messaging queue, you can easily make a queue out of it, but it is not it's purpose. If you want to have a queuing system only you can probably do better with something else.

What kinds of concurrency/deadlock issues should one be aware of in Rails code?

I've just come to a realization about deadlocks - namely what they are - and I'm concerned about this issue affecting my Rails code.
Are there any specific deadlock issues to watch out for while developing a Rails app?
Have you ever encountered a deadlock in your Rails code - or is that even possible?
(I'm not referring to database deadlocks - only application deadlocks).
Deadlock implies competition for an I/O resource, which is why it comes up for databases most often. If you're improperly locking and requesting resources and you're explicitly using threads, then yeah you need to be concerned.
However, the specific steps to take to mitigate any issues are dependent on the type of I/O you are accessing.
Rails doesn't have many OS level deadlock concerns since Ruby 1.8 is single threaded, and even in 1.9 is locked
The concerns are mainly in the database. Rails has a double whammy where ActiveRecord abstracts from the database AND stuff like FKs and constraints are pushed into application level validations (before_save, validates_*, etc) and the result discourages developers from thinking about DB deadlock situations.
If you're using MYSQl, you can read about problem areas with innodb (default in Rails) here http://dev.mysql.com/doc/refman/5.1/en/innodb-transaction-model.html

Resources