Is Amazon SQS the right choice here? Rails performance issue [closed] - ruby-on-rails

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I'm close to releasing a rails app with the common networking features (messaging, wall, etc.). I want to use some kind of background processing (most likely Bj) for off-loading tasks from the request/response cycle.
This would happen when users invite friends via email to join and for email notifications.
I'm not sure if I should just drop these invites and notifications in my Database, using a model and then just process it with a worker process every x minutes or if I should go for Amazon SQS, storing the messages and invites there and let my worker retrieve it from Amazon SQS for processing (sending the invites / notifications).
The Amazon approach would get load off my Database but I guess it is slower to retrieve messages from there.
What do you think?

Your title states that you have a Rails performance issue, but do you know this for certain? From the rest of your question it sounds like you're trying to anticipate a possible future performance issue. The only way to deal sensibly with performance issues is to get your application into the wild and profile it. Doing so will give you empirical data as to what the real performance issues are.
Given that Amazon SQS isn't free and the fact that using it will almost certainly add complexity to your application, I would migrate to it if and when database load becomes a problem. Don't try to second guess problems before they arise, because you'll find that you'll likely face different problems when your app goes live, some of which you probably haven't considered.
The main point is that you've already decided to use background processing, which is the correct decision, given that any sort of processing that isn't instantaneous doesn't belong within the Rails' request/response cycle, as it blocks that Rails process. You can always scale with Amazon later if you need to.

Is your app hosted on Amazon EC2 already? I probably wouldn't move an existing app over to AWS just so I can use SQS, but if you're already using Amazon's infrastructure, SQS ia a great choice. You could certainly set up your own messaging system (such as RabbitMQ), but by going with SQS that's one less thing you have to worry about.
There are a lot of options to add background processing to Rails apps, such as delayed_job or background_job, but my personal favorite is Workling. It gives you a nice abstraction layer that allows you to plug in different background runners without having to change the actual implementation of your jobs.
I maintain a Workling fork that adds an SQS client. There are some shortcomings (read the comments or my blog post for more details), but overall it worked well for us at my last startup.
I've also used SQS for a separate Ruby (non-Rails) project and generally found it reliable and fast enough. Like James pointed out above, you can read up to 10 messages at once, so you'll definitely want to do that (my Workling SQS client does this and buffers the messages locally).

I agree with John Topley that you don't want to over-complicate your application if you don't need to. That being said there are times when it is good to make this kind of decision early, do you anticipate high load from the beginning? Are you rolling this out to an existing user base or is it a public site that may or may not take off?
If you know you will need to handle a large amount of traffic from the beginning then this might be a good step. If you don't want to spend the money to use SQS take a look at some of the free queue solutions out there like RabbitMQ.
I currently push a couple million messages a month through SQS and it works pretty well. Make sure you plan for it being down or slow from time to time, so you would need to work in some retry facilities and exponential backoff. One of the nice things is that you can get 10 messages at a time which speeds up being able to work through the queue, you can use one request to get the 10 messages and process them 1 by 1.

Amazon SQS is a fine service, except where the following things become important:
Performance
Legal
Acknowledgments and Transactions
Messaging Idioms Message Properties
Security, Authenticity and Queue
Permissions
If any of these things are important you need to look at a real enterprise MQ service such as StormMQ, RabbitMQ, or even onlinemq.com.
I found this blog series interesting as it compares Amazon SQS to StormMQ without holding any punches back:
http://blog.stormmq.com/2011/01/06/apples-and-oranges-performance/

If you have issues with moving to EC2, You can use other services like onlinemq.com.

Related

Scaling to support a massive amount of traffic in a short period of time

Until now, our site has had a modest amount of traffic. None of our developers are big ops guys, but we've stayed ahead of it and keep the site up and running pretty quick. That said, our dev team is stretched, we've accumulated some technical debt, and there's plenty of opportunity to optimize.
Without getting into specifics, we just found out that we'll be expecting a massive amount of traffic in the near future in a very short period time. On the order of several million hits in a few hours. Scaling is one thing, but this is several orders of magnitude greater than what we're seeing now.
We're a Rails app hosted on S3 using ELB, and Postgresql.
I wanted to field some recommendations for broad starting points for scaling and load testing given this situation.
Update: Sorry, EC2, late night :)
#LastZactionHero
Pretty interesting question, let me answer you in detail, I hope you are talking about some e-commerce applications, enterprise or B2B apps doenst see spikes as such. Since you already mentioned that you are hosted your rails app on s3. Let me make couple of things clear.
1)You cant host an rails app on s3. S3 is simple storage service. Where you can only store files.
2) I guess you have hosted your rails app on AWS ec2 with a elastic load balancer attached above the ec2 instances which is pretty good.
3)You have a self managed Postgresql deployed on a ec2 instance.
If you are running on AWS you are half way safe and you can easily scale up and scale down.
I can see one problem in your present model, that your db. AWS has got db as a service. Thats called Relation database service.Which supports Mysql Oracle and MS SQL server.
RDS comes with lot of features like auto back up of your database, high IOPS etc.
But it doesnt support your Postgresql. You need to have or manage a self managed ec2 instance and run postgresql database, but make sure its fail safe and you do have proper back and restore system at place.
AWS provides auto scaling api and command line tools, pretty easy.
You dont have worry about the bandwidth issue etc, but I admit Angelo's answer too.
You can use elastic mem cache for caching your app. Use CDN if need to speed your app. RDS can manage upto 30000 IOPS, its a monster to it will do lot of work for you.
Feel free to ask me if you need any kind of help.
(Disclaimer: I am a senior devOps engineer working for an e-commerce company, use ruby on rails)
Congratulations and I hope your expectation pans out!!
This is such a difficult question to comprehensively answer given the available information. For example, is your site heavy on db reads, writes or both (and is your sharding/replication strategy in line with your db strain)? Is bandwidth an issue, etc? Obvious points would focus on making sure you have access to the appropriate hardware and that your recipies for whatever you use to provision/deploy your hardware is up to date and good to go. You can often throw hardware at a sudden spike in traffic until you can get to the root of whatever bottlenecks you discover (and yes, you will discover them at inconvenient times!)
Regarding scaling your app, you should at least:
1) Cache whatever you can. Pay attention to cache expiration, etc.
2) Be sure your DB has appropriate indexes set up (essentially, you should have an index on any field you're searching on.)
3) Watch your logs closely to identify potential long queries, N+1 queries, long view renders, etc.
4) Do things like what Shopify outlines in this post: http://www.shopify.com/technology/7535298-what-does-your-webserver-do-when-a-user-hits-refresh#axzz2O0gJDotV
5) Set up a good monitoring system (Monit, God, etc) for each layer of your stack - sudden spikes in traffic can quickly bottleneck your application in unexpected places and lead to more issues. The cascade can happen quickly.
6) Set up cron to automate all those little tasks you currently do manually...that you will probably forget about doing once you're dealing with traffic spikes.
7) Google scaling rails and you'll see tons of good info.
8) etc, etc, etc...
You can use some profiling tools (rubyperf, or something like NewRelic, etc) Whatever response you get from them is probably best to be considered as a rough baseline at best. Simple reason being that your profiling is dependent on your hardware stack which will certainly change depending on actual traffic patterns. Pretty easy to do if you have a site with one page of static content...incredibly difficult to do if you have a CMS site with a growing db and growing traffic.
Good luck!!!

How is request processing with rails, redis, and node.js asynchronous?

For web development I'd like to mix rails and node.js since I want to get the best out of both worlds (rails for fast web development and node for concurrency). I know that some people choose to just use full ruby stack with eventmachine that is integrated into rails controller so that every request can be nonblocking by using fiber in event-loop model. I have been able to understand how that works in a big picture.
At this moement however I want to try doing nonblocking request processing with rails and node.js with message queue concept. I heard that this can be achieved by using redis as an intermediary. I'm still having trouble trying to figure out how that works as of now. From what I can understand: so we have 2 apps A (rails) and B (node.js) and redis. rails app will handle requests from users that go through controllers in REST manner, and then from there rails will pass that through redis, and then redis will form queues and node.js app will pick up that queue and do whatever necessary afterhand (write or read from backend db).
My questions:
So how would that improve concurrency and scalability? from what i
know since rails handle the requests through controllers
synchronously, and then write to redis, the requests will be
blocking still, even though node.js end can pickup the queue
asynchronously. (I have a feeling that it's not asynchronous yet if it's not end to end
non-blocking).
Would node.js be considered a proxy or an application here if redis
is the intermediary?
I'm new to redis and learning it still. If I'm using 100% noSQL
solution for my backend database, such as mongoDB or couchDB, are they replaceable by redis entirely or is redis more seen as a
messaging queue tool like rabbitMQ?
Is messaging queue a different concurrency concept than threading or
event-loop model or is it supposed to supplement them?
That's all my question. I'm new to message queue concept. Will appreciate any help and pointers to right direction and articles that help me learn more. thanks.
You are mixing some things here that don't go together.
Let's first make sure we are on the same page regarding the strengths/weaknesses of the involved technologies
Rails: Used for it's web-development simplicity and perfect for serving database-backed web-applications.
Not very performant when having to serve a large number of long running requests as you'd run out of threads on your Ruby workers - but well suited for anything that can scale horizontally with more web-nodes (multiple web-servers - 1 db).
Node.js: Great for high-concurrency scenarios. Not as easy as rails to write a regular web-application in it. But can handle near an insane amount of long-running low-cpu tasks efficiently.
Redis: A Key-Value Store that supports operations on it's data-structures (increment/decrement values, append/prepent push/pop to lists - all operations that make this DB work consistently with multiple clients writing at once)
Now as you can see, there is no benefit in having Rails AND Node serve the same request - communicating through Redis. Going through the Rails Stack would not provide any benefit if the requests ends up being handled by the Node server.
And even if you only offload some processing to the node server, it's still the Rails webserver that handles the requests and has to wait for a response from node - killing the desired scalability. It simply makes no sense.
Where you would a setup with Node and Rails together is in certain areas of your app that have drastically different scaling requirements.
If you are for example writing a Website that displays live stats for Football games you can easily see that there are two different concerns in your app: The "normal" Site that contains signup, billing and profile stuff that screams for a quick implementation through rails. And the "live" portion of the site where users see live results and you expect to handle a lot of clients at once - all waiting for something to happen (low cpu - high concurrency).
In such a case it may be beneficial to actually seperate the two parts of the site into a Ruby and a Node app, with then sharing data about the user through a store like Redis (but actually you just need some shared state that both can look at and write to for synchronization purposes).
So you would use for example Rails for the Signup/Login portions - once signed up write the session cookie into redis alongside with the permissions of the user (what game is he allowed to follow) and hand the user off to the Node.js app.
There the Node app can read the session information from Redis and serve the user.
Word of advice:
You don't get scalability by simply throwing Node.js into your Toolbox. You really have to find out what Node.js is good at (low-cpu high-io concurrent operations) and how you can leverage that to remedy some of the problems your currently chosen technology has.
I can answer 3 for you. Redis does not guarantee that when you perform an operation that result will actually be on disk, also transaction handling it a bit "different". It also requires for the whole database to be in memory. Depending on the situation this can be an issue or not. It is however incredibly fast. It is not a messaging queue, you can easily make a queue out of it, but it is not it's purpose. If you want to have a queuing system only you can probably do better with something else.

Best way to run rails with long delays

I'm writing a Rails web service that interacts with various pieces of hardware scattered throughout the country.
When a call is made to the web service, the Rails app then attempts to contact the appropriate piece of hardware, get the needed information, and reply to the web client. The time between the client's call and the reply may be up to 10 seconds, depending upon lots of factors.
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
I basically see two options. Either run JRuby and use multithreading or else run several regular Ruby instances and hope that not many people try to use the service at a time. JRuby seems like the much better solution, but it still doesn't seem to be mainstream and have out of the box support at Heroku and EngineYard. The multiple instance solution seems like a total kludge.
1) Am I right about my two options? Is there a better one I'm missing?
2) Is there an easy deployment option for JRuby?
I do not want to split the web service call in two (ask for information, answer immediately with a pending reply, then force another api call to get the actual results).
From an engineering perspective, this seems like it would be the best alternative.
Why don't you want to do it?
There's a third option: If you host your Rails app with Passenger and enable global queueing, you can do this transparently. I have some actions that take several minutes, with no issues (caveat: some browsers may time out, but that may not be a concern for you).
If you're worried about browser timeout, or you cannot control the deployment environment, you may want to process it in the background:
User requests data
You enter request into a queue
Your web service returns a "ticket" identifier to check the progress
A background process processes the jobs in the queue
The user polls back, referencing the "ticket" id
As far as hosting in JRuby, I've deployed a couple of small internal applications using the glassfish gem, but I'm not sure how much I would trust it for customer-facing apps. Just make sure you run config.threadsafe! in production.rb. I've heard good things about Trinidad, too.
You can also run the web service call in a delayed background job so that it's not hogging up a web-server and can even be run on a separate physical box. This is also a much more scaleable approach. If you make the web call using AJAX then you can ping the server every second or two to see if your results are ready, that way your client is not held in limbo while the results are being calculated and the request does not time out.

Implementing an Online Waiting Room

My organization is building a new version of our ticketing site and is looking for the best way to build an online waiting room when the number of users in our purchase path exceeds a certain limit. The best version of this queue would let new users in after existing users have either completed their purchase or have exceeded a timeout limit after entering the path.
I'm trying to get an idea of how this has been implemented by other organizations. Has anyone out there done something similar or have any experience with this? We have some ideas, but I'd like to get a sense of what solutions have been tried and what problems those solutions have run up against.
Just to be complete, this site is being built in Ruby on Rails, though I'd love to hear about how people have solved this regardless of platform.
Edit: To clarify: The need for the queue is not primarily to reduce load, but to limit the speed at which the web is purchasing tickets relative to people buying in other ways, like over the phone.
Before I outline one method for this, I want to point out that what you want to do doesn't make a lot of sense. Services on the web aren't like a physical store, where I can walk up and see that it's crowded and decide to stay or not. Queueing people on your site strikes me as shifting the blame from you (unable or unwilling to adequately provision resources) to me (punishing me for trying to use your site).
If you're selling something like show tickets, where quantity is limited and each item is tied to a seat, I think it's better to reserve items and time out those reservations if they aren't paid for in a timely manner. Ticketmaster does this, and I think it's a much better solution than blocking people at the door.
If you still want to go down this path, then I'd design the system like this:
As customers come to your site, record their arrival time. As they interact with the site, record a "last seen" time. "Last seen" will be used to determine activeness. You'll need a background job running very frequently to expire sessions quickly.
Once your limit is hit, you have an ordered queue of people who are blocked. As customers complete their transaction or time out, you'll mark the next person in the queue for entry into the purchase path.
For queued users, their browsers will make a request on a regular basis, checking to see if you've let them in yet. If yes, they proceed to the purchase path. If no, they continue to wait.
The purchase path needs a mechanism to check if someone is trying to circumvent your waiting area, and sends them back.
You might find the Online queuing for ticketing guide helpful. Check their repository at GitHub.
They've integration with Ruby On Rails, PHP, .NET, iOS, Android and similar platforms.
Queue-it enables you to gain control of website overload during extreme traffic peaks by offloading end users into an online queue.
When a peak traffic event occurs on a website, the online queue system sends users to the virtual waiting room environment where the users wait and are redirected back to the website at a rate it can handle.

Multiple Uploads to Amazon S3 from Ruby on Rails - What Background Processing System to Use?

I'm developing a Ruby on Rails application that needs to allow the user to simultaneously upload 16 high-quality images at once.
This often means somewhere around 10-20 megabytes (sometimes more), but it's the number of connections that are becoming the most pertinent issue.
The images are being sent to Amazon S3 from Paperclip, which unfortunately opens and closes a new connection for each of the 16 files. Needless to say, I need to move the system to run as background processes to keep my web server from locking up like it already is with no traffic.
My question is, out of all the Rails-based systems to use for background jobs (Starling, BackgroundRb, Spawn, etc.), if there is one that might fit the bill for this scenario better than the others (I'm new to building an in-the-background system anyway, so all of the available systems are equally new to me)?
There's no shortage of rails plugins to do async processing, and basically all of them work fine. Personally I like Delayed Job's api best.
I wouldn't use Starling or other actual queue daemons since for this task using the database to store any necessary state should be just fine.
This might help!
http://aaronvb.com/blog/2009/7/19/paperclip-amazon-s3-background-upload-using-starling-and-workling
EDIT:
It's not possible, through a normal html multipart form, to send files to the background. They have to be done through that request. If you are looking for a way around that, you can try SWFUpload and then once that's done use a background process to handle the Amazon S3 uploads.
this is also a good survey blog post http://4loc.wordpress.com/2010/03/10/background-jobs-in-ruby-on-rails/
I like swfupload, we use it on some S3 apps that we wrote. It's proven to be very fast and stable. You can have actions fired off via Ajax after the uploads, etc… We have had a ton of uploads go through it with 0 failures.

Resources