I am running Ubuntu (64Bit) with Apache 2.2.17, Passenger 3.0.11, Ruby 1.9.3 and Rails 3.2.6
When accessing the web page (index.html) on my webpage the request takes ages to complete, somewhere around 30 second in extreme cases.
The server has plenty of memory available (top shows more than 4GB free), the Apache processes (there are 10 of them) each show 0% CPU in top and the load is also almost 0 and there are hardly any DB accesses as I cache most of the things with memcached.
The log files of Apache as well as Rails do not show any errors, on the contrary the render times shown in the RubyOnRails log file show excellent values (<100 ms).
So where to go from here?
Is the first request slow or all requests slow? Passengers shutdown after a given time interval. So intermittenly requests (requests with sufficient time span in between) will allow passengers to shutdown (only to be restarted at next request.
Passenger does the autoshutdown BY DESIGN. This is so because on a shared environment, there might be other user's apps. If your app is idle for a while, then the resources can be transferred to other people's app.
If you are on a tight budget and you have multiple apps hosted on the same server, then passenger is a great solution.
If you have only ONE app in your server which you control, then please reconfigure Passengers to NOT shutdown (if that indeed is your problem).
You can do "passenger-status" to see how many passengers are currently running and available for taking requests.
The configuration to ensure that Passengers stay up is PassengerMinInstances and PassengerPoolIdleTime.
Are you accessing it through a 'fake domain name' (added to your /etc/hosts file)?
If so, do
service avahi-daemon stop
At least that's what worked for me on ubuntu 10.10 :)
For some reason a DNS lookup is made on each and every request you do to the server, and when the domain doesn't exists, it times out ...
The performance issue has been keeping me busy for all these days. I believe I have nailed it down to Apache configuration: KeepAliveTimeout, it was set to a very high value (90), can't think why it was set that high, must have been a typo.
My understanding of KeepAliveTimeout is that the Apache process gets locked to the client for 90 seconds, even if the client isn't issuing any further requests, hence when traffic picks up (which it did on that day when performance was significantly reduced, page visits more than tripled) all Apache processes are busy waiting for the KeepAliveTimeout, while blocking all new requests coming in. This would also explain why the system was not showing much load at all, it was just sitting there waiting. I reduced the value down to 10, if traffic picks up I'll probably drop it to 5.
Related
I'm planning to have a Rails App that has a very content rich interactive page where many users will connect to.
Development has went well and small time testing on the Dev servers went without a hitch either.
Problems started when we started alpha testing with selected groups of people. the sever would grind to a halt suddenly. Nginx would stop because of queue being full. I was at a lose for a while, but after looking around, came to the conclusion that the live actioncable was completely eating up my memory.This especially gets bad when the user reloads the page multiple times that subscribes to actioncable, causing additional process to become active, completely stopping the server, only being cured by a nginx reboot.
I currently run a 2core 1GB memory SSD run VPS server for alpha testing, perhaps at tops 20 concurrent users.Should I be running into performance problems with such load? or should tuning the code or redis, passenger fix this?
I know its hard to say any definitive things without more specifics, but could a ballpark estimate be done with the information?
After some gogoling and testing Nginx settings, adding this directive to the nginx settings for passenger has seemed to dramatically improve the performance issue.
location /special_websocket_endpoint {
passenger_app_group_name foo_websocket;
passenger_force_max_concurrent_requests_per_process 0;
}
more info here
https://www.phusionpassenger.com/library/config/nginx/tuning_sse_and_websockets/
20 concurrent users plus multiple tabs per user is still less than about 100 concurrent websocket connections, it is not that lot.
First thing I'd look for is leaks - when for some reason websocket connection or other resources (open files etc.) does not get freed when actual user disconnects. Make sure you're running fresh versions of rails/passenger, as there was a bug in rails causing similar behaviour (see https://blog.phusion.nl/2016/07/07/actioncable-under-stress-p1/ for details)
Also while actioncable+passenger inside nginx allows you to run everything inside single process, it is not a good idea when you expect some load.
When running a clean nginx and separate rails servers for regular requests and cable - at least other parts of the app will continue some kind of working in such conditions.
I've found the following at Docs: Scaling Puppet:
Are you using the default webserver?
WEBrick, the default web server used to enable Puppet’s web services connectivity, is essentially a reference implementation, and becomes unreliable beyond about ten managed nodes. In any sort of production environment serving many nodes, you should switch to a more efficient web server implementation such as Passenger or Mongrel.
Where does the the number 10 come from in "ten managed nodes"?
I have a little over 20 nodes and I might soon have little over 30. Should I change to Passenger or not?
You should change to Passenger when you start having problems with WEBrick (or a little before). When that happens for you will depend on your workload.
The biggest problem with WEBrick is that it's single-threaded and blocking; once it's started working on a request, it cannot handle any other requests until it's done with the first one. Thus, what will make the difference to you is how much of the time Puppet spends processing requests.
Each time a client asks for its catalog, that's a request. Each separate file retrieved via puppet:/// URLs is also a request. If you're using Puppet lightly, each catalog won't take too long to generate, you won't be distributing many files on any given Puppet run, and each client won't be taking more than four to six seconds of server time every hour. If each client takes four seconds of server time per hour, 10 clients have a 5% chance of collisions0--of at least one client having to wait while another's request is processed. For 20 or 30 clients, those chances are 19% and 39%, respectively. As long as each request is short, you might be able to live with some contention, but the odds of collisions increase pretty quickly, so if you've got more than, say, 50 hosts (75% collision chance) you really ought to by using Passenger unless you're doing active performance measuring that shows that you're doing okay.
If, however, you're working your Puppet master harder--taking longer to generate catalogs, serving lots of files, serving large files, or whatever--you need to switch to Passenger sooner. I inherited a set of about thirty hosts with a WEBrick Puppet master where things were doing okay, but when I started deploying new systems, all of the Puppet traffic caused by a fresh deployment (including a couple of gigabyte files1) was preventing other hosts from getting their updates, so that's when I was forced to switch to Passenger.
In short, you'll probably be okay with 30 nodes if you're not doing anything too intense with Puppet, but at that point you need to be monitoring the performance of at least your Puppet master and preferably your clients' update status, too, so you'll know when you start running beyond the capabilities of WEBrick.
0 This is a standard birthday paradox calculation; if n is the number of clients and s is the average number of seconds of server time each client uses per hour, then the chance of having at least one collision during an hour is given by 1-(s/3600)!/((s/3600)^n*((s/3600)-n)!).
1 Puppet isn't really a good avenue for distributing files of this size in any case. I eventually switched to putting them on an NFS share that all of the hosts had access to.
For 20-30 nodes, there shouldn't be any problem. Note that passenger provides some additional features. It may be faster serving the nodes, but I am not sure how much improvement you will get if you have only 30 nodes.
You should change to passenger if you are using more than hundred nodes. I started seeing problems when the number of nodes requesting service from the puppet-master reached about 200. In my case, with the default web-server, about 5% of the nodes (random) couldn't receive the catalog during hourly run.
Up until now, I've always built my VPS dedicated to running just a single application in multiple instances, mostly with Unicorn. That way I could set up the whole environment to fit perfectly for that one specific application and be happy with it.
But now I need to build a VPS, that will host multiple small Ruby applications. Some of them will be Rails and some Sinatra. They will have basically zero traffic (under 100 visits per day), which means I don't even need multiple instances of a single app.
I don't really have experience with other servers than unicorn + nginx, but what I think I need would look something like this.
request to app1, gets loaded into memory and serves the request
request to app2, gets loaded into memory and serves the request
request to app3, there is not enough free memory
app1 gets killed before the app3 is booted to serve the request
I know this isn't an exactly perfect scenario, but imagine having a 10 or 20 small apps on a single server, where each app gets 5 hits a day. They don't exactly need to be up and running at all times.
As far as I know, Heroku does this with their free tier, where Dynos get killed after some idle time, and then they get loaded back up when a request comes in. That's basically what I need to do on my own server.
I would recommend using Apache + Passenger. Passenger by default loads the application only when you need it, e.g. the first request will take a bit longer (actually as long as it takes to load your framework).
If the application is idle for some predefined time, it will be removed from memory.
Setup is very easy and adding new applications is just adding one line in your apache configuration.
I've setup my website with Rails 3 and Passenger (via nginx) and, although its only being used by one person, the web server has to essentially wake up the rails instance to render the page. This happens only when the website is not accessed for a while (hence its sleeping), but I'm a little paranoid that could it still lag like so when the website is operating on a production level (don't get mixed with the development/production mode, the sleeping website is running in production mode when I'm checking it out).
Any ideas? Or this just a sleep and wakeup thing when nobody's using the website.
There is an easy fix just edit your nginx.conf and set passenger_min_instances to a value larger then zero. This way passenger always keeps one instance alive; this will prevent the "lag" as you describe it. Read more about it in the Passenger Nginx documentation.
Take a look at passenger_pool_idle_time. It states the maximum number of seconds that an application instance may be idle. That is, if an application instance hasn’t received any traffic after the given number of seconds, then it will be shutdown in order to conserve memory.
I have a simple Rails app deployed on a 500 MB Slicehost VPN. I'm the only one who uses the app. When I run it on my laptop, it's fast enough. But the deployed version is insanely slow. It take 6 to 10 seconds to load the login screen.
I would like to find out why it's so slow. Is it my code? (Don't think so because it's much faster locally, but maybe.) Is it Slicehost's server being overloaded? Is it the Internet?
Can someone suggest a technique or set of steps I can take to help narrow down the cause of this problem?
Update:
Sorry forgot to mention. I'm running it under CentOS 5 using Phusion Passenger (AKA mod_rails or mod_rack).
If it is just slow on the first time you load it is probably because of passenger killing the process due to inactivity. I don't remember all the details but I do recall reading people who used cron jobs to keep at least one process alive to avoid this lag that can occur with passenger needed to reload the environment.
Edit: more details here
Specifically - pool idle time defaults to 2 minutes which means after two minutes of idling passenger would have to reload the environment to serve the next request.
First, find out if there's a particularly slow response from the server. Use Firefox and the Firebug plugin to see how long each component (including JavaScript and graphics) takes to download. Assuming the main page itself is what is taking all the time, you can start profiling the application. You'll need to find a good profiler, and as I don't actually work in Ruby on Rails, I can't suggest any: google "profile ruby on rails" for some options.
As YenTheFirst points out, the server software and config you're using may contribute to a slowdown, but A) slicehost doesn't choose that, you do, as Slicehost just provides very raw server "slices" that you can treat as dedicated machines. B) you're unlikely to see a script that runs instantly suddenly take 6 seconds just because it's running as CGI. Something else must be going on. Check how much RAM you're using: have you gone into swap? Is the login slow only the first time it's hit indicating some startup issue, or is it always that slow? Is static content served slow? That'd tend to mean some network issue (either on the Slicehost side, or your local network) is slowing things down, assuming you're not in swap.
When you say "fast enough" you're being vague: does the laptop version take 1 second to the Slicehost 6? That wouldn't be entirely surprising, if the laptop is decent: after all, the reason slices are cheap is because they're a fraction of a full server. You're using probably 1/32 of an 8 core machine at Slicehost, as opposed to both cores of a modern laptop. The Slicehost cores are quick, but your laptop could be a screamer compared to 1/4 of core. :)
Try to pint point where the slowness lies
1/ application is slow, or infrastructure (network + web server)
put a static file on your web server, and access it through your browser
2/ If it is fast, it is probable a problem with application + server configuration.
database access is slow
try a page with a simpel loop: is it slow?
3/ If it slow, it is probably your infrastructure. You can check:
bad network connection: do a packet capture (with Wireshark for example) and look for retransmissions, duplicate packets, etc.
DNS resolution is slow?
server is misconfigured?
etc.
What is Slicehost using to serve it?
Fast options are things like: Mongrel, or apache's mod_rails (also called passenger phusion or
something like that)
These are dedicated servers (or plugins to servers) which run an instance of your rails app.
If your host isn't using that, then it's probably defaulting to CGI. Rails comes with a simple CGI script that will serve the page, but it reloads the app for every page.
(edit: I suspect that this is the most likely case, that your app is running off of the CGI in /webapp_directory/public/dispatch.cgi, which would explain the slowness. This tends to be a default deployment on many hosts, since it doesn't require extra configuration on their part, but it doesn't give good performance)
If your host supports "Fast CGI", rails supports that too. Fast CGI will open a CGI session, and keep it open for multiple pages, so you get much better performance, but it's not nearly as good as Mongrel or mod_rails.
Secondly, is it in 'production' or 'development' mode? The easy way to tell is to go to a page in your app that gives an error. If it shows you a stack trace, it's in development mode, which is slower than production mode. Mongrel and mod_rails have startup options to determine whether to run the app in production or development mode.
Finally, if your database is slow for whatever reason, that will be a big bottleneck as well. If you do have a good deployment (Mongrel/mod_rails/etc.) in production mode, try looking into that.
Do you have a lot of data in your DB? I would double check that you have indexed all the appropriate columns- because this can make a huge difference. On your local dev system, you probably have a lot more memory than on your 500 mb slice, which would result in the DB running a lot slower if you have big, un indexed tables. You can also run the slow queries logger in MySql to pinpoint columns without indexes.
Other than that, yes- passenger will need to spool up a process for you if you have not been using the site recently. If this is the case, you should see a significant speed increase on second, and especially third and later page loads.
You might want to run a local virtual machine with 500 MB. Are you doing a lot of client-server interaction? Delays over the WAN are significant
You might want to check out RPM (there's a free "lite" version too) and/or New Relic's Tune Up.
Your CPU time is guaranteed by Slicehost using the Xen virtualization system, so it's not that. Don't have the other answers for you, sorry! Might try 'top' on a console while you're trying to access the page.
If you are using FireFox and doing localhost testing (or maybe even on LAN) you may want to try editing the network.dns.disableIPv6 setting.
Type about:config in the address bar and filter for network.dns.disableIPv6 and double-click to set to true.
This bug has been reported mainly from Vista OS's, but some others as well.
You could try running 'top' when you SSH in to see which process is heavy. If you also have problems logging you, perhaps you may try getting Statistics in the Slicehost manager.
If you discover it is MySQL's fault, consider decreasing the number of servers it can spawn.
512 seems decent for Rails application, you might have to check if you misconfigured too.