Instagram API calls faster on Heroku than on local machine? - ruby-on-rails

I am working on a Rails app that pulls up to 100 Instagram posts at once with the media/search endpoint and displays them on a page. The AJAX call that loads the photos takes a very long time on localhost, but once deployed to Heroku, takes much less time (10s versus 1s). Can anyone explain why Heroku is faster? I might not need to worry as much about caching my results.
Thanks!!

One major reason will be Heroku's phsyical hosting location -- I believe Instagram hosts with Amazon's AWS service (this may have changed after the Facebook acquisition):
Here at Instagram, we run our infrastructure on Amazon Web Services,
running instances on their Elastic Compute Cloud (EC2)
Heroku basically hosts through Amazon's cloud too, meaning they are ostensibly running on the same network. This will obviously cut latency down to a minimum, as well as the fact that Heroku's services are optimized for efficiency -- high speed Internet etc
Cache
Your question is really "should I be creating a cache for Instagram data in my system?"
The answer is "yes" - it's my experience you should never rely on a third party entirely, as apart from obvious latency issues, you'll also have to contend with a multitude of other problems (API outages, client bandwidth etc)
I'd personally look at storing as much data as possible in my own system. This doesn't mean to keep all in your main DB - you could utilize a Redis instance to store the third-party data you need

Related

Ruby on Rails hosting - Engineyard vs Enterprise-Rails

I've been running a Rails app on 1 big dedicated server. Now for scaling I want to switch to a cloud service hoster and serve the app on 3 instances - App, DB and Redis.
I have really bad experience with Heroku performance wise and hence cost efficiency. So for me 2 Alternatives remain: Engineyard and Enterprise-Rails.
What I find important is that Engineyard doesn't offer an autoscaling option to handle peaks. On the other hand Enterprise-Rails doesn't have too much of documentation, most of it is handled by a support crew which is setting up everything.
What are other differences and what should I use for my website? I don't need much of administration work and I am not experienced with it. Basically I just want my Site to run optimally safe, stable and cost efficient without much personal work involved.
I am running a massive Rails app off AWS at this time and I'm really happy with it. Previously I had a number of dedicated boxes that were always causing problems - sooner or later one of them would crash for some reason, Raid failures, database problems whatnot.
At AWS I use RDS for database, elastic cache for caching, I keep all my code on a fat instance that acts as staging server and get a variable number of reserved instances to load the code via NFS.
I also use autoscaling - we've prepaid for a number of reserved instances and autoscaling helps starting up nodes when CPU usage goes above 60%, then removing them when it goes below 25%. autoscaling rules are based on cloudwatch alerts that can be set to monitor a particular group of instances, memcache servers, and so on, you even get e-mails and SMS notifications via SNS when certain scaling activities take place, say when more than 100 instances are spammed in less than 1 hour (massive traffic spike). The instances also get added right up to the load balancers by the way and you don't need to mess with the session store as you can use the sticky session feature which is quite nice.
Recently I also started using a 2nd launch group with spot instances, this complicated things a bit in terms of cloudwatch rules but I'm able to save a lot every month as spot prices are much lower. When the spot price (minimum) I bid is not enough, the set-up I have switches back to reserved instances.
Even more recently I've also started using CloudFront which got my app's page assets to load real fast (about 2 megs of CSS, JS, some icon sprites). Previously I was serving directly from instances via the load balancers.
This took about 20 hours to deploy, test and tune for maximum performance and availability.
One of the problems I have with AWS is that there's no support unless you're prepared to foot a bill. They claim some support is offered without a subscription but the only option in the support area is Billing. Ha. Fortunately it's all stable enough not to put me in a position where I'd have to pay for it.
Overall Rails fits in quite nice with AWS. I spend less than 2 hours per month doing maintenance, where I was spending over 30 previously. Most important for me is that I know that I can GTFO on a vacation for X months knowing nothing will cause any trouble - haven't had a monitoring alert more than a year.
Later edit: the app is a sports site with white labeling feature, lots of users, lots of administrators working on content in the back-end, database intensive as we show market pricing data that should update every few seconds. I had an average load time of about 3 seconds per page with dedicated servers that were doing about the same thing - database, memcache, storage, load balancing, web app. Now my average is under 1 second. Monthly bill is about 8 times lower now.
While Engine Yard doesn't offer auto-scaling (it is in the pipeline), we do have a fairly easy to use scaling feature that allows you to spin up multiple instances at once in times of need.
The advantages over something like Enterprise-Rails is the full documentation, the choice to deploy from the CLI or the dashboard,and our amazing support team. It's also easier to use Engine Yard and move from a personal machine or from another cloud setup than it is using a service such as AWS directly.

Scaling to support a massive amount of traffic in a short period of time

Until now, our site has had a modest amount of traffic. None of our developers are big ops guys, but we've stayed ahead of it and keep the site up and running pretty quick. That said, our dev team is stretched, we've accumulated some technical debt, and there's plenty of opportunity to optimize.
Without getting into specifics, we just found out that we'll be expecting a massive amount of traffic in the near future in a very short period time. On the order of several million hits in a few hours. Scaling is one thing, but this is several orders of magnitude greater than what we're seeing now.
We're a Rails app hosted on S3 using ELB, and Postgresql.
I wanted to field some recommendations for broad starting points for scaling and load testing given this situation.
Update: Sorry, EC2, late night :)
#LastZactionHero
Pretty interesting question, let me answer you in detail, I hope you are talking about some e-commerce applications, enterprise or B2B apps doenst see spikes as such. Since you already mentioned that you are hosted your rails app on s3. Let me make couple of things clear.
1)You cant host an rails app on s3. S3 is simple storage service. Where you can only store files.
2) I guess you have hosted your rails app on AWS ec2 with a elastic load balancer attached above the ec2 instances which is pretty good.
3)You have a self managed Postgresql deployed on a ec2 instance.
If you are running on AWS you are half way safe and you can easily scale up and scale down.
I can see one problem in your present model, that your db. AWS has got db as a service. Thats called Relation database service.Which supports Mysql Oracle and MS SQL server.
RDS comes with lot of features like auto back up of your database, high IOPS etc.
But it doesnt support your Postgresql. You need to have or manage a self managed ec2 instance and run postgresql database, but make sure its fail safe and you do have proper back and restore system at place.
AWS provides auto scaling api and command line tools, pretty easy.
You dont have worry about the bandwidth issue etc, but I admit Angelo's answer too.
You can use elastic mem cache for caching your app. Use CDN if need to speed your app. RDS can manage upto 30000 IOPS, its a monster to it will do lot of work for you.
Feel free to ask me if you need any kind of help.
(Disclaimer: I am a senior devOps engineer working for an e-commerce company, use ruby on rails)
Congratulations and I hope your expectation pans out!!
This is such a difficult question to comprehensively answer given the available information. For example, is your site heavy on db reads, writes or both (and is your sharding/replication strategy in line with your db strain)? Is bandwidth an issue, etc? Obvious points would focus on making sure you have access to the appropriate hardware and that your recipies for whatever you use to provision/deploy your hardware is up to date and good to go. You can often throw hardware at a sudden spike in traffic until you can get to the root of whatever bottlenecks you discover (and yes, you will discover them at inconvenient times!)
Regarding scaling your app, you should at least:
1) Cache whatever you can. Pay attention to cache expiration, etc.
2) Be sure your DB has appropriate indexes set up (essentially, you should have an index on any field you're searching on.)
3) Watch your logs closely to identify potential long queries, N+1 queries, long view renders, etc.
4) Do things like what Shopify outlines in this post: http://www.shopify.com/technology/7535298-what-does-your-webserver-do-when-a-user-hits-refresh#axzz2O0gJDotV
5) Set up a good monitoring system (Monit, God, etc) for each layer of your stack - sudden spikes in traffic can quickly bottleneck your application in unexpected places and lead to more issues. The cascade can happen quickly.
6) Set up cron to automate all those little tasks you currently do manually...that you will probably forget about doing once you're dealing with traffic spikes.
7) Google scaling rails and you'll see tons of good info.
8) etc, etc, etc...
You can use some profiling tools (rubyperf, or something like NewRelic, etc) Whatever response you get from them is probably best to be considered as a rough baseline at best. Simple reason being that your profiling is dependent on your hardware stack which will certainly change depending on actual traffic patterns. Pretty easy to do if you have a site with one page of static content...incredibly difficult to do if you have a CMS site with a growing db and growing traffic.
Good luck!!!

Amazon Web Service Micro Instance - Server Crash

I am currently using an AWS micro instance as a web server for a website that allows users to upload photos. Two questions:
1) When looking at my CloudWatch metrics, I have recently noticed CPU spikes, the website receives very little traffic at the moment, but becomes utterly unusable during these spikes. These spikes can last several hours and resetting the server does not eliminate the spikes.
2) Although seemingly unrelated, whenever I post a link of my website on Twitter, the server crashes (i.e.,Error Establishing a Database Connection). Once restarting Apache and MySQL, the website returns to normal functionality.
My only guess would be that the issue is somehow the result of deficiencies with the micro instance. Unfortunately, when I upgraded to the small instance, the site was actually slower due to fact that the micro instances can have two EC2 compute units.
Any suggestions?
If you want to stay in the free tier of AWS (micro instance), you should off load as much as possible away from your EC2 instance.
I would suggest you to upload the images directly to S3 instead of going through your web server (see some example for it here: http://aws.amazon.com/articles/1434).
S3 can also be used to serve most of your web pages (images, js, css...), instead of your weak web server. You can also add these files in S3 as origin to Amazon CloudFront (CDN) distribution to improve your application performance.
Another service that can help you in off loading the work is SQS (Simple Queue Service). Instead of working with online requests from users, you can send some requests (upload done, for example) as a message to SQS and have your reader process these messages on its own pace. This is good way to handel momentary load cause by several users working simultaneously with your service.
Another service is DynamoDB (managed NoSQL DB service). You can put on dynamoDB most of your current MySQL data and queries. Amazon DynamoDB also has a free tier that you can enjoy.
With the combination of the above, you can have your micro instance handling the few remaining dynamic pages until you need to scale your service with your growing success.
Wait… I'm sorry. Did you say you were running both Apache and MySQL Server on a micro instance?
First of all, that's never a good idea. Secondly, as documented, micros have low I/O and can only burst to 2 ECUs.
If you want to continue using a resource-constrained micro instance, you need to (a) put MySQL somewhere else, and (b) use something like Nginx instead of Apache as it requires far fewer resources to run. Otherwise, you should seriously consider sizing up to something larger.
I had the same issue: As far as I understand the problem is that AWS will slow you down when you reach a predefined usage. This means that they allow for a small burst but after that things will become horribly slow.
You can test that by logging in and doing something. If you use the CPU for a couple of seconds then the whole box will become extremely slow. After that you'll have to wait without doing anything at all to get things back to "normal".
That was the main reason I went for VPS instead of AWS.

Should I choose cloud?

I'm about to start development on a project with very uncertain load/traffic specifics. When it will be released there will certainly be very low load that can easily be handled by a single desktop quad code machine.
The problem is that there will be (after some invite-only period) a strong publicity for the product so I expect considerable traffic/load peaks.
I haven't read enough about cloud providers and I'm mostly leaning toward Amazon or Azure for the credibility these two companies have without checking them out as I should with others (ie. Rackspace that I suppose is also a cloud service provider).
What I want
I would like to create a normal Asp.net MVC web application that can be run on in-house single machine low-cost server. It would run web server along with database (relational and maybe also document) and fulltext search (not SQL FTS but rather high speed separate product like Lucene or Sphinx). But after initial invite-only period I'd like to move this app to the cloud to make it more traffic/load demand-friendly.
As much as I know Amazon offers a sort of virtual machine hosting which I understand you setup as a normal server but has possible flexible resources in terms of load power. I'm not sure if that can be accomplished on Azure as well.
Questions
What is your experience with application transition to cloud and which one did you choose and why?
What would you recommend I should think about when designing/developing the solution to make the transition as painless as possible.
Based on your experience is it better to move to the cloud (financial wise) or is it better to buy your own servers and load balance application yourself and maybe save money on the long run?
"Cloud" is such a vague term. Still, I think this is a very good question.
Basically, IaaS cloud hosting does not magically make your application scale. It's really a virtual private server with very short contract / cancellation periods.
For scalability, the magic lies not so much in the hosting, but in the horizontal scalability of the application code itself. This is related to all the distributed computing challenges. For example, adding more application servers is not always easy: you must be sure that you don't persist any user state in the server application (but rather in a database, static can be evil), caching can be problematic because local caches can make the situation worse if you're using a round-robin strategy, etc.
What is your experience with application transition to cloud and which one did you choose and why?
What would you recommend I should think about when designing/developing the solution to make the transition as painless as possible.
You don't really have to do anything different just to host on EC2 or Azure -- basically. But of course, it's not that easy when things grow.
For instance, EC2 instance storage is rather limited. Additional storage on EBS, however, does not provide comparable performance characteristics and can be a bit more laggy than a disk. The point here is that EBS does magically scale, and it's probably more PaaS than IaaS; but it's not a simple hard disk and it does, consequently, not behave like a hard drive. I don't know about Azure block storage. In general, expect additional abstraction layers to introduce problems of their own, no matter what they do.
Based on your experience is it better to move to the cloud (financial wise) or is it better to buy your own servers and load balance application yourself and maybe save money on the long run?
Typical cloud providers are more expensive than the usual 'round-the-corner VPS providers, but they are, to my experience, also much more reliable and professional. EC2 has a free tier (but it's quite small), Azure gives you a small instance for free for 3 months.
Doing the calculation right is rather tricky; for example, if you have to shut down your service for whatever reason, it's nice to be able to cancel now rather than pay another year - you might want to put this risk into your calculation. On the other hand, both EC2 and Azure will be considerably cheaper if you sign up for 6 or 12 months, rather than paying by the hour.
You might want to check out the free Azure plan, because it's nice to start fiddling around without any cost. A big advantage of cloud providers is that you can scale vertically very easily: buying a 16 core, 64GB RAM server machine is really expensive, but if there's so much traffic on your site, upgrading your plan won't be such a big issue.
As someone hasn't mention it yet...
AppHarbor has been amazing. You can push stuff in a matter of minutes. Deployment is a breeze. And setting up your project for it is easier as well. And it doesn't even require any major changes in your solution to fit in.
For the full-text search, you might consider something like Websolr.
A lot of this depends on what your app is doing (e.g., are there separable components that might benefit from running on different instances, vs. a simple CRUD application with a front end). One thing to consider is that in a cloud application you normally don't have a traditional relational database. As such, you have to choose either cloud or traditional hosting, or plan on coding your access layer twice. Azure does have relational databases (SQL Azure), although they're not identical to SQL Server 2008R2. You're going to have to research the pros/cons of a cloud setup for your specific situation.
As far as financial concerns, it's usually a lot cheaper to just get an account with a hosting company instead of a cloud service, since you pay by the month, instead of the hour (last time I checked an account with Azure running 24/7 for a month would cost about $40-$50, while you can get hosting for $15 a month). The savings with the cloud come in when you have to run several servers, and the cost of maintaining them surpasses the cost of the instance on the cloud platform.
So, sorry, there's no silver bullet answer for you. Read up on the different services available. Consider what your application needs, what prices will be, and go from there.
I have just migrated an MVC-based application from a dedicated server to Azure. When migrating the MSSQl-database, I first tried importing .bacpak files but some of the tables failed because of their size. I then used the SQL Database migratio wizard which worked fine for small tables but failed for tables with BLOB-fields. For these tables I had to use temporary intermediate tables. Then after a while after all the data was transferred setting up the Webapp was a breeze and we went in production. At first, everything seemed to work just fine, but after a couple of hours when the load got heavier, all kind of errors occurred. I went into the Azure portal and it was really easy to see the

Deploying my first application across AWS

I'm a web developer just now getting interested in sysadmin stuff. I've set up a server before on Linode.com (Ubuntu 10.04 LTS, nginx, Ruby on Rails, PostgreSQL), but there were some issues. Everything was on one machine, so whenever something went wrong with Linode or I got a lot of traffic, my site would go down.
Now I'm interested in setting up a personal blog, and deploying it across Amazon AWS. This is a good opportunity for me to learn to how to use multiple servers with load balancing, auto-scaling, failover, etc. The only problem is I'm not quite sure where to start.
I've read a litany of documentation from Amazon and blog posts elsewhere, but as a sysadmin newbie I have a few questions:
I get that EC2 instances are too volatile to store data on. So where should I store it? Amazon Elastic Block Store? Will the entire filesystem go there, as well as the database?
Do I need serious knowledge of load balancing and scaling? Or will the Amazon Elastic Load Balancer handle make things simple for me? How does their load balancer interact with nginx?
How much of this do you recommend doing through the AWS interface as opposed to through the command line?
Any non-obvious snags that might catch me?
Are there any tutorials for deploying a blog or simple Rails app on EC2? I don't need a production-quality setup here; my main goal is to learn.
Thanks for any answers you can provide!
I've set up my fair share of AWS deployments; here's basics:
Data store
If you have frequently accessed data, as you likely know, it is best to use a database. This is one of the hairier parts of AWS hosting. Your options are, roughly in increasing order of complexity/cost:
SimpleDB - Amazon's own database offering. They give you an HTTP api, which you use to read and write your data. There are some rails libraries for it, but on the whole, it isn't a graceful drop-in for rails.
Amazon RDS - Amazon will preconfigure a mysql-like database server for you. This requires you to boot up an DB server instance, so the pricing server isn't favorable for tiny sites. On the plus side, it allows you to scale your DB server more easily.
Roll your own - Plan around Amazon EC2 instances vanishing at any point; therefore, the local storage you get with EC2 instances can best be considered a big temp directory. Elastic Block Store is Amazon's solution to this; it effectively is a disk image your instances mount. EBS images live independently of EC2 instances, so if your server goes down, you can mount the EBS image on a new EC2 instance. You can essentially roll your own database cluster by booting a bunch of instances and configuring them to replicate off eachother. This works, but is not graceful, and should really only be attempted if you cannot solve your problem with less exotic methods.
Amazon pretty much enumerates these options, plus a few more which are not applicable to you at http://aws.amazon.com/running_databases/
Infrequently changed data should be stored in S3; there's plenty of ruby gems for accessing this easily. If your website is entirely static on the server side, you can even run your entire site off S3
Load Balancing
Amazon "Elastic Load Balancing" is quite effective at the typical web load balancing requirements. It is usually a no-brainer choice, unless you have exotic requirements. It will not scale your cluster for you, however. For auto-booting and shutting down of instances, you should look to Amazon's own auto-scaling solution
Caveats
Be sure to note which "Availability Zone" (aka datacenter) you're in. In some cases, you cannot share AWS resources across availability zones.
Tutorials
There are plenty of tutorials, but in my brief search, none that I found to be really great or up to date. However, check out https://github.com/wr0ngway/rubber , which is a ruby tool for deploying apps to EC2. It will get you most of the way there.

Resources