Proxy choices: mod_proxy_balancer, nginx + proxy balancer, haproxy? - ruby-on-rails

We're running a Rails site at http://hansard.millbanksystems.com, on a dedicated Accelerator. We currently have Apache setup with mod-proxy-balancer, proxying to four mongrels running the application.
Some requests are rather slow and in order to prevent the situation where other requests get queued up behind them, we're considering options for proxying that will direct requests to an idle mongrel if there is one.
Options appear to include:
recompiling mod_proxy_balancer for Apache as described at http://labs.reevoo.com/
compiling nginx with the fair proxy balancer for Solaris
compiling haproxy for Open Solaris (although this may not work well with SMF)
Are these reasonable options? Have we missed anything obvious? We'd be very grateful for your advice.

Apache is a bit of a strange beast to use for your balancing. It's certainly capable but it's like using a tank to do the shopping.
Haproxy/Nginx are more specifically tailored for the job. You should get higher throughput and use fewer resources at the same time.

HAProxy offers a much richer set of features for load-balancing than mod_proxy_balancer, nginx, and pretty much any other software out there.
In particular for your situation, the log output is highly customisable so it should be much easier to identify when, where and why slow requests occur.
Also, there are a few different load distribution algorithms available, with nice automatic failover capabilities too.
37Signals have a post on Rails and HAProxy here (originally seen here).

if you want to avoid Apache, it is possible to deploy a Mongrel cluster with an alternative web server, such as nginx or lighttpd, and a load balancer of some variety such as Pound or a hardware-based solution.
Pounds (http://www.apsis.ch/pound/) worked well for me!

The only issue with haproxy and SMF is that you can't use it's soft-restart feature to implement the 'refresh' action, unless you write a wrapper script. I wrote about that in a bit more detail here
However, IME haproxy has been absolutely bomb-proof on solaris, and I would recommend it highly. We ship anything from a few hundred GB to a couple of TB a day through a single haproxy instance on solaris 10 and so far (touch wood) in 2+ years of operation we've not had any problems with it.

Pound is an HTTP load balancer that I've used successfully in the past. It includes a dynamic scaling feature that may help with your specific problem:
DynScale (0|1): Enable or disable
the dynamic rescaling code (default:
0). If enabled Pound will periodically
try to modify the back-end priorities
in order to equalise the response
times from the various back-ends. This
value can be overridden for specific
services.
Pound is small, well documented, and easy to configure.

I've used mod_proxy_balancer + mongrel_cluster successfully (small traffic website).

Related

Is Traefik on Docker significantly slower with HTTPS (vs HTTP)?

I've deployed a local instance of https://librespeed.org/ in order to test my LAN speeds. After changing some old cables, the speeds were good (~800mpbs symmetric).
I wanted to leave the service running and give it a URL, so I created a docker-compose.yml and gave it some labels in order to expose it through Traefik (as my other services).
To my surprise, after this change the speed was dramatically reduced (~450mbps, almost 50% decrease).
At first I blamed Traefik, but then I just disabled HTTPS and the speeds where back to ~800mbps.
What I've checked:
All other settings and stack are exactly the same.
TLS handshake seems to be happening only once, so this does not explain the difference.
The cypher being used is TLS_AES_128_GCM_SHA256, 128bit keys, TLS 1.3. I didn't change any of Traefik default settings about cyphers, so this is probably Traefik's default.
The browser used to test was Firefox 84.0.2 (64-bit).
What I'd like to know:
Is this a common performance downgrade?
Is Traefik really slow encrypting traffic?
Does dockerization impact AES encryption in some way (perhaps blocking some hardware access)?
Thanks in advance
Edit: the noble people of reddit made me realize that my old CPU does not have hardware AES acceleration, so that answers most of my concerns. I think this question is still relevant anyway, at least to alert other people that this can happen).
The noble people of reddit made me realize that my old CPU does not have hardware AES acceleration, so that explains the performance downgrade. I still don't know if this would happen anyway because of docker, but I hope it does not.

Detecting end-user connection speed problems in Apache for Windows

Our company provides web-based management software (servicedesk, helpdesk, timesheet, etc) for our clients.
One of them have been causing a great headache for some months complaining about the connection speed with our servers.
In our individual tests, the connection and response speeds are always great.
Some information about this specific client :
They have about 300 PC's on their local network, all using the same bandwith/server for internet access.
They dont allow us to ping their server, so we cant establish a trace route.
They claim every other site (google, blogs, news, etc) are always responding fast. We know for a fact they have no intention to mislead us and know this to be true.
They might have up to 100 PC's simulateneously logged in our software at any given time. They have a need to increase that amount up to 300 so this is a major issue.
They are helpfull and colaborative in this issue we are trying to resolve for a long time.
Some information about our server and software :
We have been able to allocate more then 400 users at a single time without major speed losses for other clients.
We have gone extensive lengths to make good use of data caching and opcode caching in the software itself, and we did notice the improvement (from fast to faster)
There are no database, CPU or memory bottlenecks or leaks. Other clients are able to access the server just fine.
We have little to no knowledge on how to do some analyzing on specific end-user problems (Apache running under Windows server), and this is where I could use a lot of help.
Anything that might be related to Apache configuration would also be helpfull.
While all signs points to it being an internal problem in this specific client network, we are dedicating this effort to solve that too, if that is the case, but do not have capable or instructed professionals to deal with network problems (they do, however, while their main argument is that 'all other sites are fast, only yours is slow')
you might want to have a look at the tools from google "page speed family": http://code.google.com/speed/page-speed/docs/overview.html
your customer should maybe run the page speed extension for you. maybe then you can find out what is the problem: http://code.google.com/speed/page-speed/docs/extension.html

Localhost is taking abnormally long time to load any page

The logs don't show anything different, and the computer is four times faster than the last one. Anyone know any common reasons why making a request to localhost would take a very long time?
I am using Mongrel.
Hard to give a solution based on the little information you give, so try to narrow it down. I would say that these three causes seem the most likely:
the database is slow. You can check this if your queries take a long time (check the logs). Perhaps you are using a slow connector (i.e. the default Ruby MySQL library), or your indexes haven't made it to your new machine.
Mongrel is slow. Check by starting it with Webrick and see if that's any better
your computer is slow. Perhaps it's running something else that's taking up CPU or memory. See your performance monitor (application to use for this differs per OS).
Could be a conflict between IPv4 and IPv6. If you're running Apache you have to take special steps to make it work nicely with IPv6 (my information here might be out of date.) I've found that an IPv6-enabled client would try to talk IPv6 to the server, and Apache would not receive the request. After it timed out the client would retry on IPv4.

Preferred Placement of a Network Collector in a Switched Environment

I'm not a network specialist so my apologies if i've used some of the domain terminology incorrectly, etc. For web metrics/analytics, we currently use both client-side (js page tags) and server-side (log files) data. Neither gives us "delivery" information (e.g., connection speeds), hence the interest in Network Collectors. We are in a switched environment so installing the N/C as if it were a web server, i.e., on a switch port, won't allow it, i don't think, to see the web server traffic.
After some research, i've learned how to place the N/C by configuring a monitoring port. What concerns me about this is the m/p appears work by duplicating the traffic within the switch.
Is there are better solution for N/C placement in this type of network environment?
Don't worry Doug, switches nowadays won't falter under this sort of load. The way you have explained is quite OK.
Of course, you could buy a more expensive switch with "NetFlow" sort of support... and have the switch collect the data for you....

Best practice for rate limiting users of a REST API?

I am putting together a REST API and as I'm unsure how it will scale or what the demand for it will be, I'd like to be able to rate limit uses of it as well as to be able to temporarily refuse requests when the box is over capacity or if there is some kind of slashdotted scenario.
I'd also like to be able to gracefully bring the service down temporarily (while giving clients results that indicate the main service is offline for a bit) when/if I need to scale the service by adding more capacity.
Are there any best practices for this kind of thing? Implementation is Rails with mysql.
This is all done with outer webserver, which listens to the world (i recommend nginx or lighttpd).
Regarding rate limits, nginx is able to limit, i.e. 50 req/minute per each IP, all over get 503 page, which you can customize.
Regarding expected temporary down, in rails world this is done via special maintainance.html page. There is some kind of automation that creates or symlinks that file when rails app servers go down. I'd recommend relying not on file presence, but on actual availability of app server.
But really you are able to start/stop services without losing any connections at all. I.e. you can run separate instance of app server on different UNIX socket/IP port and have balancer (nginx/lighty/haproxy) use that new instance too. Then you shut down old instance and all clients are served with only new one. No connection lost. Of course this scenario is not always possible, depends on type of change you introduced in new version.
haproxy is a balancer-only solution. It can extremely efficiently balance requests to app servers in your farm.
For quite big service you end-up with something like:
api.domain resolving to round-robin N balancers
each balancer proxies requests to M webservers for static and P app servers for dynamic content. Oh well your REST API don't have static files, does it?
For quite small service (under 2K rps) all balancing is done inside one-two webservers.
Good answers already - if you don't want to implement the limiter yourself, there are also solutions like 3scale (http://www.3scale.net) which does rate limiting, analytics etc. for APIs. It works using a plugin (see here for the ruby api plugin) which hooks into the 3scale architecture. You can also use it via varnish and have varnish act as a rate limiting proxy.
I'd recommend implementing the rate limits outside of your application since otherwise the high traffic will still have the effect of killing your app. One good solution is to implement it as part of your apache proxy, with something like mod_evasive

Resources