While doing a load test I found passenger throwing below error at first when lots of concurrent requests hit server. And, client side it gives 502 error code. However, after some requests say 1000- 2000 requests its works fine.
2013/07/23 11:22:46 [error] 14131#0: *50226 connect() to /tmp/passenger.1.0.14107/generation-
0/request failed (11: Resource temporarily unavailable) while connecting to upstream, client: 10.251.18.167, server: 10.*, request: "GET /home HTTP/1.0", upstream: "passenger:/tmp/passenger.1.0.14107/generation-0/request:", host: hostname
Server Details.
Passenger 4.0.10
ruby 1.9.3/2.0
Server Ec2 m1.xlarge
64-bit 4core 15gb
Ubuntu 12:24 LTS
Its a web server which servers dynamic webpages for rails framework
Can somebody suggest what the issue might be?
A "temporarily unavailable" error in that context means the socket backlog is full. That can happen if your app cannot handle your requests fast enough. What happens is that the queue grows and grows, until it's full, and then you start getting those errors. In the mean time your users' response times grow and grow until they get an error. This is probably an application-level problem so it's best to try starting there. Try figuring out why your app is slow, at which request it is slow, and fix that. Or maybe you need to scale to more servers.
Related
I have "mariadb" set to 127.0.0.1 in my /etc/hosts file and sidekiq occasionally throws errors such as:
Mysql2::Error::ConnectionError: Unknown MySQL server host 'mariadb' (16)
The VM is not under significant load or anything like that.
Later edit: seems other gems have trouble resolving hosts too:
WARN -- : Unable to record event with remote Sentry server (Errno::EBUSY - Failed to open TCP connection to XXXX.ingest.sentry.io:443 (Device or resource busy - getaddrinfo)):
Anyone have any idea why that may happen?
I've figured this out a couple weeks ago but wanted to be sure before posting an answer.
I still can't figure out the mechanic of this issue but it was caused by fail2ban.
I had it running in a container polling the httpd logs and blocking the tremendous amount of bots scraping my sites.
Also I increased the max file handlers and inotify handlers.
fs.file-max = 131070
fs.inotify.max_user_watches = 65536
As soon as I got rid of fail2ban and increased the inotify handlers the errors disappeared.
Obviously fail2ban gets on the "do not touch" list because of this, and we've rolled out a 404/403/500 handler on application layer that pushes unknown IPs to Cloudflare.
Although this is probably an edge case I'm leaving this here in hope it helps someone at some point.
I have deployed a rails app with Nginx and puma web server and Sometimes I get following error.
2018/12/13 12:07:04 [info] 25621#0: *156784 client timed out (110: Connection timed out) while waiting for request, client: 10.66.20.55, server: 0.0.0
.0:80
Can you please tell me what is the meaning of this error. Is the puma server is buzy? or nginx is buzy?
As you can see, it is not an error, just [info], so don't worry about it.
Looks like client didn't send any request during keepalive_timeout, so connection has been closed.
You should warn about [error] entries, because errors occur if application really not accessible.
So I have a rails application that I built and deployed via AWS Elastic Beanstalk a few months ago. The project was put on hold so I terminated the environment, expecting to be able to re-deploy when we returned to this project.
Despite my app still running just fine on my local dev environment, I cannot get it to deploy. The error from my eb-activty.log:
PG::ConnectionBad: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
The database is a standalone AWS RDS instance that I can successfully test the connection to, so I know its running. I have added the requisite environment variables and configured my database.yml accordingly. To be clear, this is an application that used to work. I hadn't made any changes between the time I terminated the environment and when I went to re-deploy.
The root problem seems to be that nginx isn't being configured properly, as trying to access the server returns:
502 Bad Gateway
nginx/1.12.1
and when I check the nginx error.log its filled with errrors like this:
2018/09/19 14:12:35 [crit] 3069#0: *653 connect() to unix:///var/run/puma/my_app.sock failed (2: No such file or directory) while connecting to upstream, client: 172.31.47.147, server: _, request: "GET / HTTP/1.1", upstream: "http://unix:///var/run/puma/my_app.sock:/", host: "172.31.47.147"
Naturally, I googled my error, and found this stackoverflow post.
I've tried adding these suggested lines from the top-rated answer to my puma.rb
bind "unix:///var/run/puma/my_app.sock"
pidfile "/var/run/puma/my_app.sock"
Which caused no change at all.
I made sure to try the other suggestions, including having a direct look at the nginx configuration file. I did find that there's no upstream set up in the config. As best as I can see, the nginx aspect of the deployment pipeline is automated by Elastic Beanstalk so clearly something else I've set must be incorrect.
I've found that under no circumstances can I get the app to deploy using eb deploy I can only make changes by creating a new environment each time. I've recreated the app countless time, experimenting with different settings, versions of gems and packages, different ruby versions...etc. All in all, I still can't affect change on the error, I can't even get a new error! just the same PG:ConnectionBad or 502 bad gateway depending on if I look from console or browser.
From my googling I've come under the impression that this is related to puma in some regard but puma is a bit of a black box for me.
I'm feeling pretty lost here, I'd really appreciate any guidance you'd be willing to share. Feel free to ask for more info from any log or file, I'm happy to provide more detail. Thanks in advance!
Could be an RDS security group, is it configured to reach your elb?
You could also try cloning the db, to make sure its not sure weird database issue with old one, and try connecting to that.
So this wont be a very helpful answer as I never did resolve the problem. I didn't want to just leave this thread hanging though.
I ended up just creating a new rails environment, re-adding all the gems and porting my controllers/views/models/routes. Once I did that I was able to deploy without issue.
I can confirm that the issue wasn't with the security groups or the database itself. The fresh rails app was able to access the RDS instance without issue.
Thank you all for your comments and attemtps to help, it is much appreciated!
We have Ruby on Rails application, that is running on VPS. This night the nginx went down and responded with "502 Bad Gateway". Nginx error log contained lots of folowing messages:
2013/10/02 00:01:47 [error] 1136#0: *1 connect() to
unix:/app_directory/shared/sockets/unicorn.sock failed (111:
Connection refused) while connecting to upstream, client:
5.10.83.46, server: www.website.com, request: "GET /resource/206 HTTP/1.1", upstream:
"http://unix:/app_directory/shared/sockets/unicorn.sock:/resource/206",
host: "www.website.com"
These errors started suddenly, because previous error messages was 5 days earlier.
So the problem was in unicorn server. Then i opened unicorn error log and found there just some info messages, which doesn't connected with a problem. Production log was useless too.
I tried to restart server via service nginx restart, but it didn't help. Also there were not some pending processes of unicorn.
The problem was solved when i redeploy the application. And it is strange, because i deployed the same version of application 10 hours before server went down.
I'm looking for any suggestions how to prevent such 'magic' cases in future. Appreciate any help you can provide!
Looks like your unicorn server wasn't running when nginx tried to access it.
This can be caused by VPS restart, some exception in unicorn process, or killing of unicorn process due to low free memory. (IMHO VPS restart is the most possible reason)
Check unicorn by
ps aux | grep unicorn
Also you can check server uptime with
uptime
Then you can:
add script that would start unicorn on VPS boot
add it as service
run some monitoring process (like monit)
At work we're running some high traffic sites in rails. We often get a problem with the following being spammed in the nginx error log:
2011/05/24 11:20:08 [error] 90248#0: *468577825 connect() to unix:/app_path/production/shared/system/unicorn.sock failed (61: Connection refused) while connecting to upstream
Our setup is nginx on the frontend server (load balancing), and unicorn on our 4 app servers. Each unicorn is running with 8 workers. The setup is very similar to the one GitHub uses.
Most of our content is cached, and when the request hits nginx it looks for the page in memcached and serves that it if can find it - otherwise the request goes to rails.
I can solve the above issue - SOMETIMES - by doing a pkill of the unicorn processes on the servers followed by a:
cap production unicorn:check (removing all the pid's)
cap production unicorn:start
Do you guys have any clue to how I can debug this issue? We don't have any significantly high load on our database server when these problems occurs..
Something killed your unicorn process on one of the servers, or it timed out. Or you have an old app server in your upstream app_server { } block that is no longer valid. Nginx will retry it from time to time. The default is to re-try another upstream if it gets a connection error, so hopefully your clients didn't notice anything.
I don't think this is a nginx issue for me, restarting nginx didn't help. It seems to be gunicorn...A quick and dirty way to avoid this is to recycle the gunicorn instances when the system is not being used, say 1AM for example if that is an acceptable maintenance window. I run gunicorn as a service that will come back up if killed so a pkill script takes care of the recycle/respawn:
start on runlevel [2345]
stop on runlevel [06]
respawn
respawn limit 10 5
exec /var/web/proj/server.sh
I am starting to wonder if this is at all related to memory allocation. I have MongoDB running on the same system and it reserves all the memory for itself but it is supposed to yield if other applications require more memory.
Other things worth a try is getting rid of eventlet or other dependent modules when running gunicorn. uWSGI can also be used as an alternative to gunicorn.