How can I know if I need more instances? - devops

How do I know if I need more instances?
At the moment my setup is the following. I have a wordpress website running LAMP server on a micro instance with Cloudfront. Now recently this website has been getting quite a lot of traffic. Think 100-150k pageviews a day with 150-200 visitors simultaneously at peak times.
At the moment both the database and all the static files are in the server itself, but that's really not an issue as they're served from Cloudfront anyway. I could move the database to an RDS instance and the statics to S3 and then set up a scaling stack. But before going into this I would like to know if this is a solution to anything, or even if I have a problem. The website seems slightly sluggish, but I'm not even sure where that's coming from.
So my question is: how to I determine if I even have a problem? Is the "sluggishness" just my imagination? If I do, it's not something very obvious, but when the number of visitors becomes large, even a seemingly imperceptible slow down might hurt your metrics.

In my experience, first you need to monitoring your web application performance with some tools, such as new relic. then add feedback on your site, your users can send feedback to you when they feel visit your site slowly.

Related

Scaling to support a massive amount of traffic in a short period of time

Until now, our site has had a modest amount of traffic. None of our developers are big ops guys, but we've stayed ahead of it and keep the site up and running pretty quick. That said, our dev team is stretched, we've accumulated some technical debt, and there's plenty of opportunity to optimize.
Without getting into specifics, we just found out that we'll be expecting a massive amount of traffic in the near future in a very short period time. On the order of several million hits in a few hours. Scaling is one thing, but this is several orders of magnitude greater than what we're seeing now.
We're a Rails app hosted on S3 using ELB, and Postgresql.
I wanted to field some recommendations for broad starting points for scaling and load testing given this situation.
Update: Sorry, EC2, late night :)
#LastZactionHero
Pretty interesting question, let me answer you in detail, I hope you are talking about some e-commerce applications, enterprise or B2B apps doenst see spikes as such. Since you already mentioned that you are hosted your rails app on s3. Let me make couple of things clear.
1)You cant host an rails app on s3. S3 is simple storage service. Where you can only store files.
2) I guess you have hosted your rails app on AWS ec2 with a elastic load balancer attached above the ec2 instances which is pretty good.
3)You have a self managed Postgresql deployed on a ec2 instance.
If you are running on AWS you are half way safe and you can easily scale up and scale down.
I can see one problem in your present model, that your db. AWS has got db as a service. Thats called Relation database service.Which supports Mysql Oracle and MS SQL server.
RDS comes with lot of features like auto back up of your database, high IOPS etc.
But it doesnt support your Postgresql. You need to have or manage a self managed ec2 instance and run postgresql database, but make sure its fail safe and you do have proper back and restore system at place.
AWS provides auto scaling api and command line tools, pretty easy.
You dont have worry about the bandwidth issue etc, but I admit Angelo's answer too.
You can use elastic mem cache for caching your app. Use CDN if need to speed your app. RDS can manage upto 30000 IOPS, its a monster to it will do lot of work for you.
Feel free to ask me if you need any kind of help.
(Disclaimer: I am a senior devOps engineer working for an e-commerce company, use ruby on rails)
Congratulations and I hope your expectation pans out!!
This is such a difficult question to comprehensively answer given the available information. For example, is your site heavy on db reads, writes or both (and is your sharding/replication strategy in line with your db strain)? Is bandwidth an issue, etc? Obvious points would focus on making sure you have access to the appropriate hardware and that your recipies for whatever you use to provision/deploy your hardware is up to date and good to go. You can often throw hardware at a sudden spike in traffic until you can get to the root of whatever bottlenecks you discover (and yes, you will discover them at inconvenient times!)
Regarding scaling your app, you should at least:
1) Cache whatever you can. Pay attention to cache expiration, etc.
2) Be sure your DB has appropriate indexes set up (essentially, you should have an index on any field you're searching on.)
3) Watch your logs closely to identify potential long queries, N+1 queries, long view renders, etc.
4) Do things like what Shopify outlines in this post: http://www.shopify.com/technology/7535298-what-does-your-webserver-do-when-a-user-hits-refresh#axzz2O0gJDotV
5) Set up a good monitoring system (Monit, God, etc) for each layer of your stack - sudden spikes in traffic can quickly bottleneck your application in unexpected places and lead to more issues. The cascade can happen quickly.
6) Set up cron to automate all those little tasks you currently do manually...that you will probably forget about doing once you're dealing with traffic spikes.
7) Google scaling rails and you'll see tons of good info.
8) etc, etc, etc...
You can use some profiling tools (rubyperf, or something like NewRelic, etc) Whatever response you get from them is probably best to be considered as a rough baseline at best. Simple reason being that your profiling is dependent on your hardware stack which will certainly change depending on actual traffic patterns. Pretty easy to do if you have a site with one page of static content...incredibly difficult to do if you have a CMS site with a growing db and growing traffic.
Good luck!!!

Amazon Web Service Micro Instance - Server Crash

I am currently using an AWS micro instance as a web server for a website that allows users to upload photos. Two questions:
1) When looking at my CloudWatch metrics, I have recently noticed CPU spikes, the website receives very little traffic at the moment, but becomes utterly unusable during these spikes. These spikes can last several hours and resetting the server does not eliminate the spikes.
2) Although seemingly unrelated, whenever I post a link of my website on Twitter, the server crashes (i.e.,Error Establishing a Database Connection). Once restarting Apache and MySQL, the website returns to normal functionality.
My only guess would be that the issue is somehow the result of deficiencies with the micro instance. Unfortunately, when I upgraded to the small instance, the site was actually slower due to fact that the micro instances can have two EC2 compute units.
Any suggestions?
If you want to stay in the free tier of AWS (micro instance), you should off load as much as possible away from your EC2 instance.
I would suggest you to upload the images directly to S3 instead of going through your web server (see some example for it here: http://aws.amazon.com/articles/1434).
S3 can also be used to serve most of your web pages (images, js, css...), instead of your weak web server. You can also add these files in S3 as origin to Amazon CloudFront (CDN) distribution to improve your application performance.
Another service that can help you in off loading the work is SQS (Simple Queue Service). Instead of working with online requests from users, you can send some requests (upload done, for example) as a message to SQS and have your reader process these messages on its own pace. This is good way to handel momentary load cause by several users working simultaneously with your service.
Another service is DynamoDB (managed NoSQL DB service). You can put on dynamoDB most of your current MySQL data and queries. Amazon DynamoDB also has a free tier that you can enjoy.
With the combination of the above, you can have your micro instance handling the few remaining dynamic pages until you need to scale your service with your growing success.
Wait… I'm sorry. Did you say you were running both Apache and MySQL Server on a micro instance?
First of all, that's never a good idea. Secondly, as documented, micros have low I/O and can only burst to 2 ECUs.
If you want to continue using a resource-constrained micro instance, you need to (a) put MySQL somewhere else, and (b) use something like Nginx instead of Apache as it requires far fewer resources to run. Otherwise, you should seriously consider sizing up to something larger.
I had the same issue: As far as I understand the problem is that AWS will slow you down when you reach a predefined usage. This means that they allow for a small burst but after that things will become horribly slow.
You can test that by logging in and doing something. If you use the CPU for a couple of seconds then the whole box will become extremely slow. After that you'll have to wait without doing anything at all to get things back to "normal".
That was the main reason I went for VPS instead of AWS.

issues with deploying rails servers to multiple hosts

I've heard often that deploying a traditional monolithic Rails app (i.e. no internal Web API, no message queue, no Redis/memcached server) to multiple servers can produce a bunch of bugs that are very hard to debug but I'm having a hard time coming up with some concrete examples despite a few hours of googling
Some obvious issues that I can think of are:
Observers - likely will not work properly as the observation is only propagated on one server and not all of them (assuming there is no Message Queue)
Sessions - would probably need to store these in the database which would need it's own host
Caches - any sweepers would have issues propagating invalidations between servers.
Anyone else care to contribute? I'd really appreciate any articles others may have come across or just general wisdom :)
Observers are just code callbacks.
They run on each process, on each server.
Sessions have defaulted to the cookie store for the last few years.
So multiple servers are no problem.
If you don't have enough space in your cookie then I suggest you may be doing something wrong.
Cache invalidation is indeed a problem.
But it always is.
One solution is to break your cache out into a standalone service.
Sites like Facebook have giant farms of memcache
I think scaling and clustering is always a hard problem.
But this seems to be an old argument against rails.
If anything the last few years have seen rails shine in this respect.
With ec2, nosql, and server automation becoming quite a norm in the community.

Server runs out of RAM - How to make more efficient?

My website seems to go down for up to an hour almost every day. I don't know much about servers so would appreciate advice on what to do next.
The website (3dsbuzz.com) has a Wordpress section, Wiki, Gallery and a VBulletin forum each with their own custom code and plug-ins. So there are lots of potential places which could be inefficiently coded. It runs from a cloud server with 2GB of RAM. The MySQL database is 370MB. We get around 30,000 page views per day.
What would be the best way to reduce the amount of down time? Should I upgrade my server (I can't really afford to) or is 2GB reasonable? I have plenty of errors in my error log, but they don't always occur around the same times as the server going down, so I'm not sure how relevant it is.
A solution is to remove each conponent one by one until the server no longer crashes.
Then, you know where to look for and can investigate further.

Are you using AWSDBProxy? Is there a performance hit when scaling out?

It seems that the only tutorials out there talking about using Amazon's SimpleDB in a rails site are using AWSDBProxy... Personally, I find this counter-intuitive to scaling out, considering the server layout of a typical Rails site below (using AWSDBProxy):
Plugin here: http://agilewebdevelopment.com/plugins/aws_sdb_proxy
Image here: http://www.freeimagehosting.net/uploads/91be4e0617.png
As you can see, even if we add more mongrels, we have two problems.
We have a single point of failure far less stable than our load balancer
We have to force all our information through this one WEBrick server
The solution is, of course, to add more AWSDBProxies... but why not then just use the following code in say, a class, skipping the proxy all together?
service = AwsSdb::Service.new(Logger.new(nil),
CONFIG['aws_access_key_id'],
CONFIG['aws_secret_access_key'])
service.query(domain, query)
So what I'm getting at, is if you are using AWSDBProxy, what are you justifications for it? And if you are indeed using it, what is your performance like? If you have hard numbers, this would be even more appreciated!
I'm not using it, nor have I ever heard of it, but this is what I would think are reasonable reasons.
You're running your main app server on EC2, so the chance of Internet FAIL doesn't really affect you more than once.
You run one proxy on each of your app servers. So it's connection going down is no worse than it's connection(s) to the database going down.
Because it can be done. This is as good a reason as any in an open source project. Sometimes it takes building a thing before you know whether said thing is a good/bad idea.
You don't have the traffic levels to need a load balancer. Then your diagram squashes down to a line, if not a single machine.

Resources