We're running wordpress on lightsail (with a CDN/distribution) and occasionally things take longer than 30 seconds and we get 504 errors.
It seems out lightsail distribution times out after 30 seconds. I know in cloudfront you can configure this, but can't seem to find a setting in the UI or CLI. Does anyone know if its possible?
thanks,
Max
Related
I've deployed my ROR app at AWS EC2 instance using Nginx and Puma. Now, I have a page in app that runs lots of queries in loops(I know that's bad but we'll be improving it in some time).
Now the thing is, this page is giving 502 Gateway Timeout error resulting in crashing Puma Server. I investigated the CPU processes on server and it shows that ruby process runs at 100% CPU for few seconds and after that Puma crashes.
I'm unsure why is this happening, as the same page with same data loads on local PC in 6-7 seconds.
Is this some limit from AWS on processes?
Is this something on the Puma side?
Without further information, it's not possible to give an exact answer what's causing the problem.
As an "educated guess", I'd say it could be an out-of-memory issue.
I found the issue after multiple hours of debugging. It was a very edge case scenario putting server in an infinite loop causing memory to overflow.
I used top -i to investigate the increasing memory.
Thank you all for suggestions and responses.
To preface this, I know an HTTP request that's longer than 1 minute is bad design and I'm going to look into Cloud Tasks but I do want to figure out why this issue is happening.
So as specified in the title, I have a simple API request to a Cloud Run service (fully managed) that takes longer than 1 minute which does some DB operations and generates PDFs and uploads them to GCS. When I make this request from the client (browser), it consistently gives me back a 502 response after 1 minute of waiting (presumably coming from the HTTP Load Balancer):
However when I look at the logs the request is successfully completed (in about 4 to 5 min):
I'm also getting one of these "errors" for each PDF that's being generated and uploaded to GCS, but from what I read these shouldn't really be the issue?:
To verify that it's not just some timeout issue with the application code or the browser, I put a 5 min sleep on a random API call on a local build and everything worked fine and dandy.
I have set the request timeout on Cloud Run to the maximum (15min), the max concurrency to the default 80, amount of CPU and RAM to 2 and 2GB respectively and the timeout on the Fastify (node.js) server to 15 min as well. Furthermore I went through the logs and couldn't spot an error indicating that the instance was out of memory or any other error around the time that I'm receiving the 502 error. Finally, I also followed the advice to use strace to have a more in depth look at system calls, just in case something's going very wrong there but from what I saw, everything looked fine.
In the end my suspicion is that there's some weird race condition in routing between the container and gateway/load balancer but I know next to nothing about Knative (on which Cloud Run is built) so again, it's just a hunch.
If anyone has any more ideas on why this is happening, please let me know!
We have just put our system into production and we have a lot of users on the production system. Our servers keep failing and we are not sure why. It seems to start with one server then it elects a new master and then a few minutes later all the servers go down in the cluster. I have it setup to send all the writes to the read databases and to leave the writes to the master. I have looked through the logs and cannot seem to find a root cause. Let me know what logs I should upload and or where I should look. Today alone we have had to restart the servers 4 times and it fixes it for a bit but its not a cure for the issue.
All databases are 16GB ram and 8 cpus and SSDs. I have them setup with the following settings in the neo4j.properties
neostore.nodestore.db.mapped_memory=1024M
neostore.relationshipstore.db.mapped_memory=2048M
neostore.propertystore.db.mapped_memory=6144M
neostore.propertystore.db.strings.mapped_memory=512M
neostore.propertystore.db.arrays.mapped_memory=512M
We are using newrelic to monitor the server and we do not see the hardware getting above 50% CPU and 40% memory so we are pretty sure that is not it.
Any help is appreciated :)
I follow these links for configuration
https://devcenter.heroku.com/articles/rails-unicorn
http://www.neilmiddleton.com/getting-more-from-your-heroku-dynos/
my config/unicorn.rb:
worker_processes 2
timeout 60
With this config, it still gives a timeout error after 30sec.
The Heroku router will timeout all requests at 30 seconds. You cannot reconfigure this.
See https://devcenter.heroku.com/articles/request-timeout
It is considered a good idea to set the application level timeouts to a lower value than the hard 30 second limit so that you don't leave dynos processing requests that the router has already timed out.
If you have requests that are regularly taking longer than 30 seconds you may need to push some of the work involved onto a background worker process.
I am running a sinatra application on server. Often it generates a time out error.
504 Gateway Time-out The server didn't respond in time.
And some time it works well. I could not find out the cause of this. Any one please help me.
This happened because of high load. I've created 2 more instances of this (on haproxy). And increased the time out value. Now it works fine.