Is it safe to use redis for long-time-pending tasks in Rails via resque? - ruby-on-rails

how safe it is to use Resque for long-time-pending tasks in Rails (for example: Resque.enqueue_in(7.days, JobClass)) with resque and resque-scheduler gems? Good to mention that instance is running on Heroku. What is the chance of losing queue?

Semi Persistence
As per my comments, the downside of using Redis (which Resque uses) is the potentiality of downtime.
As the "RAM" of web-apps, Redis stores semi persistent data. This basically means it only retains your data whilst it is operating. Any downtime would lose your queue.
This is the crux of your question -
can you afford to lose your queue?
The answer is only one which you can answer.
There are, however, a number of factors which I would consider:
*Redis isn't a full database... it's a JSON key:value pair store.
In short, this should only be used to give you a simple mechanism to use snippets of data at particular times. We use Redis to store ID's of users on one of our systems, as an example.
This means that if you're storing more data inside Redis than simple ID's or other base data, are you using it correctly?
How Important Is Your Queue?
As with any app, you'll want to retain as much user data as possible.
The difference, however, is how critical the queue is to your system.
I gave the example of a Hotel Reservation system. Perhaps you have "welcome" emails you need to send, "directions" emails to help guests find the hotel, etc. Are these "mission critical"?
I have designed an email marketing system before. The queuing system for that was extensive; but it wasn't the "queue" that was important - it was all the associative data which went with it.
Thus, we saved our data in a datatable and only used Redis to store a pure queue of what was going to be sent, and when. We then ran some scripts every minute (I've forgotten the specifics), which went through the Redis queue.
Bottom line is that I would highly recommend storing the appropriate data in a table, and then only processing the requests as you need them.
The underpin of the queue is that you need to be able to rebuild it if it goes down. If you're storing your values in redis alone, this will prove to be a major bottleneck.
Hitting the DB is well worth the data you'll get from it in the long run.

Related

Should data being used by ActiveJob (resque) be persisted or put into a ruby object and passed by object id?

I am using Twilio to send/receive texts in a Rails 4.2 app. I am sending in bulk, around 1000 at a time, and receiving sporadically.
Currently when I receive a text I save it to the DB (to, from, body) and then pass that record to an ActiveJob worker to process later. For sending messages I currently persist the Twilio params to another DB and pass that record to a different ActiveJob worker. Since I am often doing it in batches I have two workers. The first outgoing message worker sends a single message. The second one queries the DB and finds all the user who should receive the message, creates a DB record for each message that should be sent, and then passes that record to the first outgoing message worker. So the second one basically just creates a bunch of jobs for the first one to process.
Right now I have the workers destroying the records once they finish processing (both incoming and outgoing). I am worried about not persisting things incase the server, redis, or resque go down but I do not know if this is actually a good design pattern. It was suggested to me just to use a vanilla ruby object and pass it's id to the worker but I am not sure how that effects data reliability. So is it over kill to be creating all these DBs and should I just be creating vanilla ruby objects and passing those object's ids to the workers?
Any and all insight is appreciated,
Drew
It seems to me that the approach of sending a minimal amount of data to your jobs is the best approach. Check out the 'Best Practices' section on the sidekiq wiki: https://github.com/mperham/sidekiq/wiki/Best-Practices
What if your queue backs up and that quote object changes in the meantime? Don't save state to Sidekiq, save simple identifiers. Look up the objects once you actually need them in your perform method.
Also in terms of reliability - you should be worried about your job queue going down. It happens. You either design your system to be fault tolerant of a failure or you find a job queue system that has higher reliability guarantees (but even then no queue system can guarantee 100% message deliverability). Sidekiq pro has better reliability guarantees than sidekiq (non-pro), but if you design your jobs with a little bit of forethought, you can create jobs that can scan your database after a crash and re-queue any jobs that may have been lost.
How much work you spend desinging fault tolerant solutions really just depends how critical it is that your information make it from point A to point B :)

Is it a good idea to use MQ to store data in DB?

I'm going to use rabbitMQ as a message broker and switch most of the scripts to sending data to queue instead of performing direct writes/reads. Consumer will get those messages and perform corresponding operations. In my dreams this will give me more flexibility choosing DB engine, app level sharding and so on. But is it a good idea generally? Or am I missing something? Current write load is ~15k inserts/deletes for mysql and 30-50k sets for redis instances. Read load is the same ~15-20k selects, and 50-70k gets for redis.
The biggest issue you'll face will be the fact that your DB writes will be asynchronously processed. If a client writes data to the DB and then instantly reads it back, the value might not be what it originally inserted because the Rabbit queue might have been very busy or slow, delaying the update operation. Or an admin might accidentally purge your queue and then you'll have all these clients thinking their transactions had been committed but nothing will have been stored.
This sounds like a classic case of premature optimization. It's a solution in search of a problem, and you should probably avoid doing it.
With amqp you can run a none asynchronous operations using a RPC way, with this kind of architecture you should figure out all problems related with asynchronous operations.

Create multiple Rails servers sharing same database

I have a Rails app hosted on Heroku. I have to do long backend calculations and queries against a mySQL database.
My understanding is that using DelayedJob or Whenever gems to invoke backend processes will still have impact on Rails (front-end) server performance. Therefore, I would like to set up two different Rails servers.
The first server is for front-end (responding to users' requests) as in a regular Rails app.
The second server (also a Rails server) is for back-end queries and calculation only. It will only read from mySQL, do calculation then write results into anothers Redis server.
My sense is that not lot of Rails developers do this. They prefer running background jobs on a Rails server and adding more workers as needed. Is my sever structure a good design, or is it an overkill? Is there any pitfall I should be aware of?
Thank you.
I don't see any reason why a background job like DelayedJob would cause any more overhead on your main application than another server would. The DelayedJob runs in it's own process so the dyno's for your main app aren't affected. The only impact could be on the database queries but that will be the same whether from a background job or another app altogether that is accessing the same database.
I would recommend using DelayedJob and workers on your primary app. It keeps things simple and shouldn't be any worse performance wise.
One other thing to consider if you are really worried about performance is to have a database "follower", this is effectively a second database that keeps itself up to date with your primary database but can only be used for reads (not writes). There may be better documentation about it, but you can get the idea here https://devcenter.heroku.com/articles/fast-database-changeovers#create_a_follower. You could then have these lengthy background jobs read data from here leaving your main database completely unaffected.

How is request processing with rails, redis, and node.js asynchronous?

For web development I'd like to mix rails and node.js since I want to get the best out of both worlds (rails for fast web development and node for concurrency). I know that some people choose to just use full ruby stack with eventmachine that is integrated into rails controller so that every request can be nonblocking by using fiber in event-loop model. I have been able to understand how that works in a big picture.
At this moement however I want to try doing nonblocking request processing with rails and node.js with message queue concept. I heard that this can be achieved by using redis as an intermediary. I'm still having trouble trying to figure out how that works as of now. From what I can understand: so we have 2 apps A (rails) and B (node.js) and redis. rails app will handle requests from users that go through controllers in REST manner, and then from there rails will pass that through redis, and then redis will form queues and node.js app will pick up that queue and do whatever necessary afterhand (write or read from backend db).
My questions:
So how would that improve concurrency and scalability? from what i
know since rails handle the requests through controllers
synchronously, and then write to redis, the requests will be
blocking still, even though node.js end can pickup the queue
asynchronously. (I have a feeling that it's not asynchronous yet if it's not end to end
non-blocking).
Would node.js be considered a proxy or an application here if redis
is the intermediary?
I'm new to redis and learning it still. If I'm using 100% noSQL
solution for my backend database, such as mongoDB or couchDB, are they replaceable by redis entirely or is redis more seen as a
messaging queue tool like rabbitMQ?
Is messaging queue a different concurrency concept than threading or
event-loop model or is it supposed to supplement them?
That's all my question. I'm new to message queue concept. Will appreciate any help and pointers to right direction and articles that help me learn more. thanks.
You are mixing some things here that don't go together.
Let's first make sure we are on the same page regarding the strengths/weaknesses of the involved technologies
Rails: Used for it's web-development simplicity and perfect for serving database-backed web-applications.
Not very performant when having to serve a large number of long running requests as you'd run out of threads on your Ruby workers - but well suited for anything that can scale horizontally with more web-nodes (multiple web-servers - 1 db).
Node.js: Great for high-concurrency scenarios. Not as easy as rails to write a regular web-application in it. But can handle near an insane amount of long-running low-cpu tasks efficiently.
Redis: A Key-Value Store that supports operations on it's data-structures (increment/decrement values, append/prepent push/pop to lists - all operations that make this DB work consistently with multiple clients writing at once)
Now as you can see, there is no benefit in having Rails AND Node serve the same request - communicating through Redis. Going through the Rails Stack would not provide any benefit if the requests ends up being handled by the Node server.
And even if you only offload some processing to the node server, it's still the Rails webserver that handles the requests and has to wait for a response from node - killing the desired scalability. It simply makes no sense.
Where you would a setup with Node and Rails together is in certain areas of your app that have drastically different scaling requirements.
If you are for example writing a Website that displays live stats for Football games you can easily see that there are two different concerns in your app: The "normal" Site that contains signup, billing and profile stuff that screams for a quick implementation through rails. And the "live" portion of the site where users see live results and you expect to handle a lot of clients at once - all waiting for something to happen (low cpu - high concurrency).
In such a case it may be beneficial to actually seperate the two parts of the site into a Ruby and a Node app, with then sharing data about the user through a store like Redis (but actually you just need some shared state that both can look at and write to for synchronization purposes).
So you would use for example Rails for the Signup/Login portions - once signed up write the session cookie into redis alongside with the permissions of the user (what game is he allowed to follow) and hand the user off to the Node.js app.
There the Node app can read the session information from Redis and serve the user.
Word of advice:
You don't get scalability by simply throwing Node.js into your Toolbox. You really have to find out what Node.js is good at (low-cpu high-io concurrent operations) and how you can leverage that to remedy some of the problems your currently chosen technology has.
I can answer 3 for you. Redis does not guarantee that when you perform an operation that result will actually be on disk, also transaction handling it a bit "different". It also requires for the whole database to be in memory. Depending on the situation this can be an issue or not. It is however incredibly fast. It is not a messaging queue, you can easily make a queue out of it, but it is not it's purpose. If you want to have a queuing system only you can probably do better with something else.

How to guarantee data integrity for concurrent Rails/Active Record operations

I need to implement a feature for a rails site that will involve reading and exporting most of my database.
I know this operation is going to take a while. That's fine-- I've got delayed job for that.
What I'm worried about is the data changing during the running of the job, and the resulting export being corrupted because of that.
My initial thought was to do all of the reads within a transaction. However, I would also like to be running the reads concurrently, if possible. ActiveRecord docs say that Transactions cannot be shared between Connections, and Connections cannot be shared between Threads. So it looks as though I am restricted to a single thread with this approach.
Any suggestions for a workaround? Is there another way to give the job a consistent view of the data that doesn't involve transactions? Or is there some alternative to ActiveRecord/Mysql out there that can distribute transactions across threads?

Resources