I am running a Spring Cloud Gateway(version 3.0.3) Project with reactor netty(version 2.5.7). My system is with 8 core processor and 16 GB RAM. I have a route which calls the HttpBin API. When i launch load testing with Constant throughput of 25 tps, i am getting around 0.75% 502 (Bad Gate way) errors.
What might be the issue.
I tried increasing the Netty's worker Thread count to 200, but no change in the throughput. still getting around 0.75% 502 errors.
Update:- After upgrading Spring Cloud Gateway version to 4.0.0, the 502 error count is reduced. It is now 0.15% 502 errors.
I have no webserver runnning on my ec2 machine, but I still get 502 bad gateway from the load balancer in front of it.
Why do I get bad gateway error from the load balancer, but no bad gateway error, when there is no load balancer in front of the ec2 machine, but just a time out.
The load balancer regularly does health checks on its target machines, i.e. it sends an HTTP or TCP request (as you have configured it). This way it knows what machines in its target pool are healthy and can take requests and which can't. It's supposed to balance the load between multiple machines after all.
When your EC2 machine does not have a running web server, its health check fails and it's seen as unavailable by the load balancer. Since apparently there's no other healthy machine in the pool, the load balancer cannot forward any requests to anything, and thus answers with a 502 Bad Gateway status.
The difference to just timing out when you try to access your EC2 machine directly is that in the case of a load balancer, there's still something that can accept and handle HTTP requests and return appropriate HTTP error codes. When you simply have no web server whatsoever, the connection cannot be accepted by anything and thus can only time out.
Hi have a few NodeJS servers on GCP Run and when there is a lot of requests, Google adds more instances. I've noticed that when they are added, we temporarily get a lot of 503 errors but the adding of instances could be because of instances being added.
Things I have tried to do to fix this:
Reduce the concurrency from 1000 to 500 to 300. We are still getting 503 errors but in some cases fewer
Used health checks to ping an endpoint to make sure express is running before sending traffic. This also might have helped but we are still getting quite a few 503 errors.
I expect that we should be able to fix this so that we don't have any 503 errors but we are still getting quite a few.
What else should I try?
Update #1:
It takes about 10 to 20 seconds to start:
Update #2:
I noticed that the 503 errors happen in groups and that the requests are not just a few milliseconds which means that they are running when the 503 errors occur. That means that GCP Run is likely adding instances because of the 503 errors and the 503 errors are not happening when the instances are added.
What can cause a 503 error all of a sudden for an instance that is already running?
I have a dockerized Go application running on two GCP instances, everything works fine when using them with their individual external IPs, but when put through the load balancer, they're either slow to answer or it answers a 502 server error. The health checks seems to be ok, so I really don't understand.
In the logs, the error thrown is
failed_to_connect_to_backend
I've already seen other answers on this question, but none of them seems to provide an answer for my case. I cannot modify the way the application is served, so it doesn't seems to be a timeout thing.
To troubleshoot 502 response from the Load Balancer due to "failed_to_connect_to_backend." I would check the followings:
1) Usually, "failed_to_connect_to_backend" error message indicates that the load balancer is failing to connect to backends, investigating URL map rules is also a good point to start. I would also suggest reviewing your Load Balancer's URL map to make sure that Host rules, Path matcher, and Path rules are correctly defined and comply with descriptions in this article.
2) Also check if the backend instances are exhausting their resources, If a backend server is overwhelmed, it will refuse incoming requests, potentially causing the load balancer to give up on it and return the specific 502 error you're experiencing. For Apache, you could use this link and nginx this link. Also, check the output on how many established connections are present at any one time using 'netstat' and watch command.
3) I would also recommend testing again with the HTTP(S) request directly to the instance, request the same URL that reporting 502. You might do this test in another VM instance in your VPC network.
checking whether your backend block google's cloud cdn ip address or not.those addresses can be found here:https://cloud.google.com/compute/docs/faq#find_ip_range
this happened to me more than once, I was using apache in my servers, and the issue was not of CPU, but of configuration,
I am using apache mpm_event in combination with php-fpm and there are many settings that will limit the max amount of requests that you want apache and fpm to allow.
In my case I increased in Apache MPM config MaxRequestWorkers from the default 150 to 600, and in PHP FPM config pm.max_children to 80 (I don't remember what was the default here)
This worked as expected, hope this helps you to extrapolate to your own stack.
Just encountered 502 errors myself on access to a Prometheus pod running on my GKE Standard cluster (exposed through IAP).
The issue was that the configured External HTTP/S Load Balancer's health check was coming back unhealthy. This was despite the Prometheus pod running as expected. After digging into the issue I found out that the GCP auto-generated health check was faulty, it was checking URL / instead of /-/ready. When I deleted the Prometheus k8s Ingress resource (which auto-generates GCPs LB and Health Check) and recreated it - the issue was resolved (after a few minutes of resource propagation).
I'm trying to set up a multiuser jupyter setup. For this case I've set up an jupyterhub with RemoteCSVAuthenticator and DockerSpawner.
Authentication seems to work fine and also if I log in, a docker-container is started. But after logging in I only get an 502 Error-Message:
502 : Bad Gateway
The error was:
Failed to check authorization (upstream problem)
The jupyterhost logfile shows no errors. The dockercontainer is the plain
jupyterhub/singleuser.
Can anyone tell me where to start?
After trying to digg deeper into the problem I found that if I try to access the jupyter-process inside the docker container (e.g. http://172.17.0.36:8888/) it always throws error 404 - page not found. I don't think this is normal. Maybe this is the cause for the configurable-http-proxy to throw the "Bad Gateway" error.
Finally I found the problem. Since in our company we need to set a proxy I set $http_proxy and $https_proxy inside the Docker-Container. This made the jupyterhub-single running inside docker unable to open the connection to the host. My solution was to setup an local proxy on my host and forward local conenctions to the host everything else goes through the company's proxy.