How To Manage AWS RDS Database Connections? - ruby-on-rails

I'm fairly new when it comes to building and managing the back-end architecture of an application.
I'm hosting a Ruby on Rails application through AWS and one of the services I'm using is AWS RDS.
I've recently come across an issue where I reached the limit on the number of database connections I can make on my DB instance (seemingly as a result of Elastic Beanstalk deployments connecting to my DB when running the DB migrations, and not closing (?) the connections after it's done), and don't know how to best go about addressing it and managing it.
For anyone that has had experience using Amazon RDS with a PostgreSQL DB, what resources/services do I need to setup in order to make sure I manage my database connections correctly (so that I avoid the limit as much as possible)?
I have heard of PGBouncer for managing Database Connections, but I was wondering if there were other resources/services that anyone else can share so that I can make a more informed decision on what to use.

Had a similar issue myself awhile back. You can look into the Rails Reaper as well to see if that suits your purposes, but it was PGBouncer that ended up fixing my issue

Related

Multiple Projects, Multiple languages, Same authentication

So I have multiple projects that I want to use a common core set of tables in a Postgres database to map an authentication scheme between them with. Things like a 'user' 'account' 'group' or other related user information is stored in these tables. The projects I have currently are a nodejs (multiple devices) and a Ruby web app (planning on multiple devices later on) and we could have a Django or another node project in the future as well. Is there an efficient, cost effective way to do this that would be scalable and reliable? I was thinking about using an s3 instance with a Postgres database hosted on it and pointing all my authentication from my multiple apps to that database but I wanted to see if others had thought about this problem as well.
First, please don't attempt to host postgres in S3.. I believe you may have meant EC2 with an EBS volume (which is really on s3). From an ease of use standpoint (particularly when considering ongoing maintenance), hosting any postgres instance on Amazon's RDS product is truly a pleasure. Without going into all the details of that product, I'll simply state that you can set up high availability (failover), backups, upgrades, and monitoring with just a few clicks.
That being said, RDS is not the cheapest solution, but the cost is not exorbitant either, depending on your load and number of simultaneous connections.
If all this database is going to do is authenticate people and then disconnect-- that very well will be overkill and will be a waste of resources. However, if you're housing a fairly complex set of permissions and other user data, it'll likely be a fairly straightforward solution.
Depending on your budget and requirements, you may benefit from running pgPool on your app server somewhere to pool the connections-- but I wouldn't start out using pgPool unless you need it.

SSH tunneling to a remote DB from Heroku?

I'm thinking of deploying a small Rails app on Heroku. In an effort to save money, I'd like my app to use an external database (to which I have free access), rather than a Heroku-hosted database. The trouble is that the free database only accepts local connections. To access it from Heroku, I'd need to do so via an SSH tunnel.
Is it possible for a Heroku app to persist its data in an external DB accessed via SSH? If so, how?
(For bonus points, here's a second question: is this a good idea? On the one hand, this scheme would save me from paying for a Heroku database. On the other hand, it means having to encrypt all my database traffic. I imagine that this would massively slow down my web dynos, and reduce the number of requests they can serve. Would the money I save on the database get used up paying for more dynos? Am I likely to come out ahead by doing this?)
Yes, you can.
It is possible to set up a tunnel on Heroku to an external database.
You don't want to do it for the reasons the O.P. mentions (to avoid paying for a local database) for the reasons #sgrif mentions (it would be painfully slow and probably not really save anything)
But there are legitimate reasons for wanting to tunnel to an external database, for example if data is residing in a legacy system that you need to analyze.
Rather than simply repeat myself (it's long), here's a link to the recipe that worked for me: SSH tunneling from Heroku
No, and even if it was an option it's a really bad idea as you'd be adding massive latency to every request, since you'd for all intents and purposes have to open a new tunnel for every request.
Your best option is likely to use Heroku's development or starter tiers. The free development tier will work if your database is less than 10,000 rows. Their $15/mo starter tier works for up to 1,000,000 rows.

Moving Heroku Shared Database

I recently reached the 5mb database limit with heroku, the costs rise dramatically after this point so I'm looking to move the database elsewhere.
I am very new to using VPS and setting up servers from scratch, however, I have done this recently for another app.
I have a couple questions related to this:
Is it possible to create a database on a VPS and point my rails app on heroku to use that database?
If so, what would database.yml actually look like. What would be an example localhost with the database stored outside the app?
These may be elementary questions but my knowledge of servers and programming is very much self taught, so I admit, there may be huge loopholes in things that I "should" already understand.
Note: Other (simpler) suggestions for moving my database are welcomed. Thanks.
OK - for starters, yes you can host a database external to Heroku and point your database.yml at that server - it's simply a case of setting up the hostname to point at the right address, and give it the correct credentials.
However, you need to consider a couple of things:
1) Latency - unless you're hosting inside EC2 East the latency between Heroku and your DB will cause you all sorts of performance issues.
2) Setting up a database server is not a simple task. You need consider how secure it is, how it performs, keeping it up to date, keeping it backed up, and having to worry day and night about it being up. With Heroku you don't need to do this as it's fully managed.
Price wise, are you aware of the new low cost Postgres plans at Heroku? $15/mo will get you 20Gb (shared instance), and $50/mp will get you a terabyte (dedicated instance). To me, that is absurdly cheap as I value my time much more, and I know how many hours I would need to invest in making my own server to save maybe $10 a month.
It would be cheaper to use Amazon RDS, which is officially supported by Heroku and served from the same datacenter (Amazon US-East). If you do want to use a VPS, use an Amazon EC2 instance in US-East for maximum performance. This tutorial shows exactly how to do it with Django in detail. Even if you don't decide to use EC2, refer to that tutorial to see how to properly add external database information to your Heroku application so that Heroku doesn't try to overwrite it.
Still, Heroku's shared database is extremely cost-competitive -- far moreso than most VPSes and with much less setup and maintenance.

Deploying my first application across AWS

I'm a web developer just now getting interested in sysadmin stuff. I've set up a server before on Linode.com (Ubuntu 10.04 LTS, nginx, Ruby on Rails, PostgreSQL), but there were some issues. Everything was on one machine, so whenever something went wrong with Linode or I got a lot of traffic, my site would go down.
Now I'm interested in setting up a personal blog, and deploying it across Amazon AWS. This is a good opportunity for me to learn to how to use multiple servers with load balancing, auto-scaling, failover, etc. The only problem is I'm not quite sure where to start.
I've read a litany of documentation from Amazon and blog posts elsewhere, but as a sysadmin newbie I have a few questions:
I get that EC2 instances are too volatile to store data on. So where should I store it? Amazon Elastic Block Store? Will the entire filesystem go there, as well as the database?
Do I need serious knowledge of load balancing and scaling? Or will the Amazon Elastic Load Balancer handle make things simple for me? How does their load balancer interact with nginx?
How much of this do you recommend doing through the AWS interface as opposed to through the command line?
Any non-obvious snags that might catch me?
Are there any tutorials for deploying a blog or simple Rails app on EC2? I don't need a production-quality setup here; my main goal is to learn.
Thanks for any answers you can provide!
I've set up my fair share of AWS deployments; here's basics:
Data store
If you have frequently accessed data, as you likely know, it is best to use a database. This is one of the hairier parts of AWS hosting. Your options are, roughly in increasing order of complexity/cost:
SimpleDB - Amazon's own database offering. They give you an HTTP api, which you use to read and write your data. There are some rails libraries for it, but on the whole, it isn't a graceful drop-in for rails.
Amazon RDS - Amazon will preconfigure a mysql-like database server for you. This requires you to boot up an DB server instance, so the pricing server isn't favorable for tiny sites. On the plus side, it allows you to scale your DB server more easily.
Roll your own - Plan around Amazon EC2 instances vanishing at any point; therefore, the local storage you get with EC2 instances can best be considered a big temp directory. Elastic Block Store is Amazon's solution to this; it effectively is a disk image your instances mount. EBS images live independently of EC2 instances, so if your server goes down, you can mount the EBS image on a new EC2 instance. You can essentially roll your own database cluster by booting a bunch of instances and configuring them to replicate off eachother. This works, but is not graceful, and should really only be attempted if you cannot solve your problem with less exotic methods.
Amazon pretty much enumerates these options, plus a few more which are not applicable to you at http://aws.amazon.com/running_databases/
Infrequently changed data should be stored in S3; there's plenty of ruby gems for accessing this easily. If your website is entirely static on the server side, you can even run your entire site off S3
Load Balancing
Amazon "Elastic Load Balancing" is quite effective at the typical web load balancing requirements. It is usually a no-brainer choice, unless you have exotic requirements. It will not scale your cluster for you, however. For auto-booting and shutting down of instances, you should look to Amazon's own auto-scaling solution
Caveats
Be sure to note which "Availability Zone" (aka datacenter) you're in. In some cases, you cannot share AWS resources across availability zones.
Tutorials
There are plenty of tutorials, but in my brief search, none that I found to be really great or up to date. However, check out https://github.com/wr0ngway/rubber , which is a ruby tool for deploying apps to EC2. It will get you most of the way there.

amazon simpledb with aws-sdb-proxy suitable for high traffic production app?

i am using amazon simpledb with the aws_sdb gem and aws-sdb proxy as outlined in a documentation from amazon with ruby on rails and a local aws proxy that runs on webrick (providing a bridge with ActiveResource).
see http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1242
i am wondering if the aws-sdb-proxy (webrick!) is suitable for high traffic load, since webrick is supposed to be a development server. anyone has comments or experiences?
I've tried Rails with simple_record and I can tell you it's much slower compared to MySQL. You will also have to do quite some work to change your code to adapt to this.
Therefore if you have any high traffic tables that update frequently, I'd say just pass on it. Use MySQL or a different solution. SimpleDB is good only to store metadata for whatever doesn't update very often, and if you get a lot of traffic to that you definitely should get some memcached servers in front of it.
Check this out for some numbers (disregard the Dynamo part of it, I'm now on SDB and moving either back to RDS or Dynamo tonight) Moving MySQL table to AWS DynamoDB - how to set it up?

Resources