AWS ALB returns 502 and request doesnt reach targets - amazon-elb

We have an ALB load balancing 2 targets. But we are experiencing 502 responses from ALB quite often, that interrupts UI to throw an error. Our application is UI based running on node js and uses socket.io.
Sample log from ALB access log.
https 2019-06-10T09:29:46.987095Z app/DES-G1-USE4-ext-elb/7c8fddfc050d66f6 184.51.199.55:55418 10.72.72.155:8888 0.000 8.697 -1 502 - 876 485 "GET https://designer-use4.genesyscloud.com:443/socket.io/?EIO=3&transport=polling&t=Mj0kC0w HTTP/1.1" "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:65.0) Gecko/20100101 Firefox/65.0" ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 arn:aws:elasticloadbalancing:us-east-1:361190373704:targetgroup/DES-G1-USE4-tg-desg/4cc8feabb9ee8f89 "Root=1-5cfe2302-b19c6d5059c2fc6096e048e0" "-" "session-reused" 0 2019-06-10T09:29:38.289000Z "forward" "-" "-"
Here -1 502 - means that request is forwarded to the backend by ELB, but the target is not responded. And the connection is b/w ELB and target is closed somehow. As per https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-access-logs.html.
In our application log, there is no trace of this request reached our application.
Please help us to debug this issue from ALB.

Make sure that your application is responding with a 200-399 response code on the / path. The behavior that you are mentioning is often because the Load Balancer Health Check is not able to verify that your application is running correctly. Also, make sure that your server is able to respond on HTTP that path ( HTTP/2 is not yet supported ).

Potential causes of this:
firewall: make sure security groups of your application server and ELB are able to connect to each other on the designated port.
health check: the load balancer will kick an application server that crashes a few times. Make sure that the application server is attached and showing as healthy on the dashboard.

Related

Why is my lb responding with bad gateway?

I have no webserver runnning on my ec2 machine, but I still get 502 bad gateway from the load balancer in front of it.
Why do I get bad gateway error from the load balancer, but no bad gateway error, when there is no load balancer in front of the ec2 machine, but just a time out.
The load balancer regularly does health checks on its target machines, i.e. it sends an HTTP or TCP request (as you have configured it). This way it knows what machines in its target pool are healthy and can take requests and which can't. It's supposed to balance the load between multiple machines after all.
When your EC2 machine does not have a running web server, its health check fails and it's seen as unavailable by the load balancer. Since apparently there's no other healthy machine in the pool, the load balancer cannot forward any requests to anything, and thus answers with a 502 Bad Gateway status.
The difference to just timing out when you try to access your EC2 machine directly is that in the case of a load balancer, there's still something that can accept and handle HTTP requests and return appropriate HTTP error codes. When you simply have no web server whatsoever, the connection cannot be accepted by anything and thus can only time out.

ALB logs showing elb_status_code is 502

ALB is giving me 502 errors randomly during the playwright test. When I debugged and checked ALB logs in Athena and I found elb_status_code is 502
Also access log entry says, the request_processing_time is 0.0, the target_processing_time is 0.005, and the response_processing_time is -1.
As per this AWS documentation:
https://aws.amazon.com/premiumsupport/knowledge-center/elb-alb-troubleshoot-502-errors/
If the elb_status_code is "502" and the target_status_code is "-", then your load balancer is the source of the HTTP 502 errors
But I didn't understand a way to fix it. Can someone please help me with it?
I had the same issue - which caused due to the keep-alive timeouts (the server used shorter timeouts than the load balancer)
Take a look on that post:
AWS Load Balancer 502

Would Rails' production.log log an HTTP 408 request?

I'm running some Rails servers behind unicorn, which is behind nginx.
I can see a handful of POST requests with a status of 408 in nginx's access.log.
123.45.67.890 - - [17/Mar/2016:01:23:45 +1100] "POST /collections/ABC/items/DEFGH HTTP/1.1" 408 0 "http://catalog.paradisec.org.au/collections/ABC/items/DEFGH/edit" "MY FAVOURITE BROWSER"
But I can't see anything from the same time in unicorn.log (which doesn't log an awful lot of stuff) or production.log (which logs a fair amount of stuff).
Should Rails be logging anything in a HTTP 408 scenario?
production.log only logs requests which actually reach rails. It seems to me, that a 408 is caused by nginx. This can happen if your unicorn workers are busy. So in this case the request neither reached unicorn nor rails.

Guidance needed - how to track nginx-docker-rails performance issue

My Rails application is deployed on Amazon elastic beanstalk using Docker. Web requests flow into an nginx web server that forwards them to the thin rails server residing in docker. Somewhere along the way there's a bottleneck. Once every 50 requests (or so) I see nginx reports serving time which is x40 higher than the time the rails thin server reports.
here's an example:
NGINX (7490ms):
146.185.56.206 - silo [18/Mar/2015:13:35:55 +0000] "GET /needs/117.json?iPhone HTTP/1.1" 200 2114 "-" "Mozilla/5.0 (Macintosh;
Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/40.0.2214.115 Safari/537.36" 7.490 7.490 .
THIN (rails server): 171ms
2015-03-18 13:35:48.349 [INFO] method=GET path=/needs/117.json
format=json controller=needs action=show status=200
duration=171.96 view=109.73 db=29.20 time=2015-03-18 13:35:47
Can anyone supply some guidance how to troubleshoot such a situation? How do I find the source of the response time difference? I guess it could be either nginx or docker or thin or linux.
It sounds like one of the thin processes is under load doing a heavy task and nginx is still sending requests to the busy one. If there were a problem of queuing in Thin, the request would take a short time to be processed itself, but longer to get to the top of the queue. So first, check others requests before or around that request.
Second, if you are using upstream to serve (http://nginx.org/en/docs/http/ngx_http_upstream_module.html), apparently you could get among others $upstream_response_time and try to log it.
Third, you could also try to reproduce a similar setup in dev/qa and try a stress test. If you manage to reproduce it consistently, you could see number of request on each queue (i.e. http://comments.gmane.org/gmane.comp.lang.ruby.unicorn.general/848).

AWS ELB keeps showing private IP in Location header on redirect

We are running a simple Rails 4.0 app (on Ubuntu 14.04) with config.force_ssl = true. SSL is uploaded to our ELB and both 443 and 80 ports on ELB forward to 80 on the servers.
We keep failing our PCI scan because with HTTP/1.0 the private IP of the ELB shows up in Location header. Any ides on how to fix this? I have researched this extensively for days and really stuck now.
$ telnet app.ourwebsite.com 80
GET / HTTP/1.0
HTTP/1.1 301 Moved Permanently
...
*Location: https://172.31.26.236/*
Server: nginx/1.6.2 + Phusion Passenger 4.0.57
Set-Cookie: AWSELB=...
The reason why it's failing PCI compliance is because the test being used, nessus plugin 10759, uses HTTP 1.0 and doesn't send a host header, and the AWS ELB crafts its own host header when it connects to your instance and for whatever reason sets it to the internal IP of the load balancer.
Here's a capture of such a request using Apache's mod_forensics. Note the host header, which is set to the internal IP of the load balancer:
+11143:551306af:1|GET / HTTP/1.1|host:10.3.2.57|X-Forwarded-For:REDACTED|X-Forwarded-Port:443|X-Forwarded-Proto:https|Connection:keep-alive
This is how the request was made (on *nix using bash):
(echo -ne "GET / HTTP/1.0\r\n\r\n"; sleep 2) | openssl s_client -connect 1.2.3.4:443
Where 1.2.3.4 is the IP address of your ELB. You must craft the request like this because the ELB won't accept \n\n directly typed into openssl.
Note that the only header sent here was GET.
My site was failing a PCI scan, which lead me here, so I felt this needed further explanation.
It sounds like your ELB is receiving https and then forwarding on http. So your Rails server doesn't need to worry about SSL, since the only thing it's communicating with is the ELB.
(This is how we have it set up also.)
So you can remove config.force_ssl = true from your Rails server and make your ELB require SSL.

Resources