How to deploy a [Ruby on Rails] site in a scalable way?

How to deploy a [Ruby on Rails] site in a scalable way? - ruby-on-rails

I have been working on my [first] startup for a month now, and while it's probably atleast one more month away from an alpha release, I want to know how to deploy it the right way. The site will have an initial high amount of load (network + CPU) for a new user, so I am thinking of having a separate server/queue for this initial process, so that it doesn't slow down the site for existing users.
Based on my research so far, I am currently leaning towards nginx + haproxy + unicorn/thin + memcached + mysql, and deploying on Linode. However, I have no prior experience in any of the above; hence I am hoping to tap the community's experience.
Does the above architecture seem reasonable? Any suggestions/articles/books that you would recommend?
Is Linode a good choice? Heroku/EY seem too expensive for me (atleast until I have enough revenue), but am I missing some other better option? MediaTemple?
Any good suggestions on the load balancing architecture? I am still reading up on this.
Is it better have 2 separate Rails server instances on 2 separate linodes, or running 1 instance on a linode of twice capacity (in terms of RAM/storage/bandwidth)? How many Linodes should I start with?
Which Linux distribution should I choose? (Linode offers 8 different ones - http://www.linode.com/faq.cfm) Are there any relative advantages/disadvantages between them for a Rails site?
I apologize if any of my questions are stupid or contradictory; please attribute it to my inexperience.

Architecture
You're on the right track. I personally prefer Passenger over thin/unicorn (having run nginx to thin backends for a long while) just for the convenience, but your proposed setup is fairly standard. If you're on Ruby 1.8.7, I'd recommend that you consider REE + Passenger for framework memory savings, though.
Hosting & Load Balancing
Linode is fantastic, and I use them for just about everything I can, but you will need to be aware of RAM limits. Each Rails processes uses a nontrivial amount of RAM, and you'll want to avoid getting the machine into swap. Plan on running enough Rails instances per machine so that your memory allocation is about 90% of the memory on the Linode. You'll likely want another Linode dedicated to your database, though you can start with them both on the same machine; just be prepared to split off MySQL as you grow. You can set up communications between Linodes in the same data center on private IPs, which don't count against your bandwidth quota.
Your scaling strategy should be as horizontal as possible, so I'd recommend just getting a second Linode and adding it to your haproxy pool when you need more horsepower - Linode charges you $20 for 512mb more RAM, or you can just get a whole 'nother Linode (with CPU, RAM, HDD, and bandwidth quota) for that same $20. Seems a no-brainer!). In Rails' case, an instance is an instance is an instance, so it really doesn't matter if it's on the same VM or not, as long as the time to connect to your database machine or whatnot are more or less the same. You could be running 10 Linodes each running 10 Rails processes apiece without much of an issue. Linode also offers IP failover, so that if your primary Linode (with haproxy) goes down, it can fail over automatically to a secondary Linode, which you would then have haproxy running on, and ready to act in the same capacity as the first.
Distribution
Honestly, this is up to you! Many folks go with Ubuntu or Redhat (CentOS/Fedora) distros - I like CentOS myself - but it's really just about what you feel most comfortable with. If you don't have a favorite distro, I would recommend trying Ubuntu/CentOS, as they tend to be quite friendly to the beginner, and have extremely robust community support.
You will probably want to pick a 32-bit distro unless you have a compelling reason to pick a 64-bit distro; 64-bit executables require more RAM than their 32-bit counterparts, and since RAM is likely to be your most precious resource, it makes sense to save it where you can.

Related

Ruby on Rails server requirements

I use rails for small applications, but I'm not at all an expert. I'm hosting them on a Digital Ocean server with 512MB ram, which seems to be insufficient.
I was wondering what are Ruby on Rails server requirements (in terms of RAM) for a single app.
Besides I can I measure if my server is able to support the number of application on my server?
Many thanks

It depends on how much traffic you think you need to handle. We have two machines (a 32GB RAM, usage see below) with 32 unicorn workers two serve one app with loads of traffic and we have one machine with loads of 2 worker apps that have very few traffic.
We also have to consider the database (which needs the most RAM by far in our case due to big caches we granted it). And on top of that all we have *nix which caches the filesystem in unused RAM.
Conclusion: It is very hard to tell without you telling us what sort of traffic you expect.
Our memory usage on one of the two servers for the big app: https://gist.github.com/2called-chaos/bc2710744374f6e4a8e9b2d8c45b91cf
The output is from a little ruby script I made called unistat: https://gist.github.com/2called-chaos/50fc5412b34aea335fe9

How billing is working on Azure for Hosting a Website (asp.net mvc)

I just moved to .Net programming and built a website based on ASP.NET MVC 5 framework.
I come from php programming and I have to admit that MVC has some good advantages.
However , when it comes to deploy website on the internet I'm a bit lost.
I decided to go on Azure, while it seems to much problem to deploy the Microsoft framework on a linux servers ( and it s does not seem optimized)
However I don 't understand at all the pricing policy with this cloud system.
http://azure.microsoft.com/en-us/pricing/details/websites/
What is a Compute instance ?
And what is this hour rate ?
Does it mean that no one access to your website during one hour this won't be charged ?
The memory they mentioned is it RAM memory ?
If yes it's seems to be very few compared to a normal server.
I'm looking for something enough fast.
Moreover I developed my website with a PostgreSQL, but I have the impression that I have to order a separate virtual machine which will host my database.
I'm sorry if my questions are a bit vague, but it's so much different than a simple Apache server.

A compute instance on Azure, is something that has a CPU reserved for you. This can mean, it is not used at all and just waiting for your command.
Examples of compute engines are:
Virtual Machine
Web site
You can run a free Website on Azure. You cannot use your own domain (at least not supported by Azure), and they will stop when not used. This means the first request is slow, the second and later requests are good. When you get too many requests, it will not fit anymore in a free site, but a startup will fit.
If you are outsite the free range, Azure bills per hour (or even minute), that you have the site (or virtual machine) active.
The RAM seems small, but if you have no UI running, you need a lot less RAM.
The advantage of Azure is you can run on a small cheap machine, but you can upgrade very fast, even for a few hours.

Portable Development Environment

I am working on a huge Grails Project, and more and more people are going to be working on it.
The problem is that I have been spending ages setting up people's machines (java etc etc).
I heard that there is something like a VM that you can set up on your machine once and transfer on other people's computers.
Also what about performance issues? Lets say I install a VM on other people machine that will slow down things no?

Yes it will be very bad in performance as VM runs as a full fledged machine with OS on top of it. If you have all systems with very high configuration like quad-core i7's and minimum of 8Gb rams then only systems will be able to handle such load.
Still it will be a bad idea to do so. And while running grails you already have multiple instances of java running like in my case with netbeans I have 2 resource hungry java instances running with 700mb and 500mb respectively.

Why is membase server so slow in response time?

I have a problem that membase is being very slow on my environment.
I am running several production servers (Passenger) on rails 2.3.10 ruby 1.8.7.
Those servers communicate with 2 membase machines in a cluster.
the membase machines each have 64G of memory and a100g EBS attached to them, 1G swap.
My problem is that membase is being VERY slow in response time and is actually the slowest part right now in all of the application lifecycle.
my question is: Why?
the rails gem I am using is memcache-northscale.
the membase server is 1.7.1 (latest).
The server is doing between 2K-7K ops per second (for the cluster)
The response time from membase (based on NewRelic) is 250ms in average which is HUGE and unreasonable.
Does anybody know why is this happening?
What can I do inorder to improve this time?

It's hard to immediately say with the data at hand, but I think I have a few things you may wish to dig into to narrow down where the issue may be.
First of all, do your stats with membase show a significant number of background fetches? This is in the Web UI statistics for "disk reads per second". If so, that's the likely culprit for the higher latencies.
You can read more about the statistics and sizing in the manual, particularly the sections on statistics and cluster design considerations.
Second, you're reporting 250ms on average. Is this a sliding average, or overall? Do you have something like max 90th or max 99th latencies? Some outlying disk fetches can give you a large average, when most requests (for example, those from RAM that don't need disk fetches) are actually quite speedy.
Are your systems spread throughout availability zones? What kind of instances are you using? Are the clients and servers in the same Amazon AWS region? I suspect the answer may be "yes" to the first, which means about 1.5ms overhead when using xlarge instances from recent measurements. This can matter if you're doing a lot of fetches synchronously and in serial in a given method.
I expect it's all in one region, but it's worth double checking since those latencies sound like WAN latencies.
Finally, there is an updated Ruby gem, backwards compatible with Fauna. Couchbase, Inc. has been working to add back to Fauna upstream. If possible, you may want to try the gem referenced here:
http://www.couchbase.org/code/couchbase/ruby/2.0.0

You will also want to look at running Moxi on the client-side. By accessing Membase, you need to go through a proxy (called Moxi). By default, it's installed on the server which means you might make a request to one of the servers that doesn't actually have the key. Moxi will go get it...but then you're doubling the network traffic.
Installing Moxi on the client-side will eliminate this extra network traffic: http://www.couchbase.org/wiki/display/membase/Moxi
Perry

Proxy choices: mod_proxy_balancer, nginx + proxy balancer, haproxy?

We're running a Rails site at http://hansard.millbanksystems.com, on a dedicated Accelerator. We currently have Apache setup with mod-proxy-balancer, proxying to four mongrels running the application.
Some requests are rather slow and in order to prevent the situation where other requests get queued up behind them, we're considering options for proxying that will direct requests to an idle mongrel if there is one.
Options appear to include:
recompiling mod_proxy_balancer for Apache as described at http://labs.reevoo.com/
compiling nginx with the fair proxy balancer for Solaris
compiling haproxy for Open Solaris (although this may not work well with SMF)
Are these reasonable options? Have we missed anything obvious? We'd be very grateful for your advice.

Apache is a bit of a strange beast to use for your balancing. It's certainly capable but it's like using a tank to do the shopping.
Haproxy/Nginx are more specifically tailored for the job. You should get higher throughput and use fewer resources at the same time.

HAProxy offers a much richer set of features for load-balancing than mod_proxy_balancer, nginx, and pretty much any other software out there.
In particular for your situation, the log output is highly customisable so it should be much easier to identify when, where and why slow requests occur.
Also, there are a few different load distribution algorithms available, with nice automatic failover capabilities too.
37Signals have a post on Rails and HAProxy here (originally seen here).

if you want to avoid Apache, it is possible to deploy a Mongrel cluster with an alternative web server, such as nginx or lighttpd, and a load balancer of some variety such as Pound or a hardware-based solution.
Pounds (http://www.apsis.ch/pound/) worked well for me!

The only issue with haproxy and SMF is that you can't use it's soft-restart feature to implement the 'refresh' action, unless you write a wrapper script. I wrote about that in a bit more detail here
However, IME haproxy has been absolutely bomb-proof on solaris, and I would recommend it highly. We ship anything from a few hundred GB to a couple of TB a day through a single haproxy instance on solaris 10 and so far (touch wood) in 2+ years of operation we've not had any problems with it.

Pound is an HTTP load balancer that I've used successfully in the past. It includes a dynamic scaling feature that may help with your specific problem:
DynScale (0|1): Enable or disable
the dynamic rescaling code (default:
0). If enabled Pound will periodically
try to modify the back-end priorities
in order to equalise the response
times from the various back-ends. This
value can be overridden for specific
services.
Pound is small, well documented, and easy to configure.

I've used mod_proxy_balancer + mongrel_cluster successfully (small traffic website).

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart