500 on Google Cloud Run: The request failed because the instance could not start successfully - google-cloud-run

I'm doing load testing on an ExpressJS app hosted on Google Cloud Run, upon spike increase in traffic, there is a period where I see many 500 errors in Stackdriver with the message "The request failed because the instance could not start successfully." - which effectively leads to server downtime.
Seeing that this error occurs more frequently as the app scales up, I'm thinking this is caused by the Cloud Run load balancer assigning traffic prematurely to new instances, before these instances are ready to accept requests.
As I continue to run the load test, the instances are continuously and repeatedly killed and restarted, so there is no mechanism for recovery while the load is on.
I don't see any error logs from my NodeJS application, suggesting none of the failed requests actually reached my app.
What can I do to avoid these errors?
How does Cloud Run determine that a port is ready to accept requests?
Is it something I misconfigured in my ExpressJS app or can I somehow delay Cloud Run a bit before sending requests to a new instance?

This turned out to be caused by a combination of Cloud Run auto-scaling maximum instance limit and Cloud SQL's connection limit.
I was running a small Cloud SQL Postgres instance (3.75 GB / 1 vCPU) which comes with a default connection limit of 100. (https://cloud.google.com/sql/docs/quotas)
By default, Cloud Run assigns a maximum instance count of 1000 for auto-scaling. During the load test, the sudden spike in request count pushed the auto-scaling to create hundreds of instances, which quickly exhausted the Cloud SQL connection limit of 100.
This exact scenario is documented for Cloud SQL: https://cloud.google.com/sql/docs/postgres/connect-run#connection_limits_3 (it would be nice if this is also documented on Cloud Run, it did not immediately occur to me to look for documentation on Cloud SQL when this issue occurred)
The solution is a combination of limiting the maximum instance count on Cloud Run to a number that is tolerable, and adjusting resource allocation / maximum connection limit on Cloud SQL. The exact configuration would obviously depend on the expected level of load.

Related

Is it ok to let a Cloud Run process continue after a HTTP response?

I am scheduling a task in Cloud Run which injects data from a Firestore export into BiqQuery using the bq command line tool. This takes a while.
I discovered that the Cloud Scheduler Attempt Deadline doesn't match Cloud Run Maximum tasks timeout. It is 30mins vs 60mins.
I don't need to display a failure in Cloud Scheduler as I can use regular error monitoring, so I plan to respond to the scheduler request before the cloud run process has finished.
Is it ok to do this, or will the Cloud Run process potentially be killed by the auto-scaling mechanism after the HTTP response has been sent?
You can read the life cycle container here in the documentation. In summary, by default the CPU(s) is only allowed to the Cloud Run service during the request processing, else it is throttled.
You can set the throttling to false to let the CPU ON on the Cloud Run instance after the response has been sent (and you will pay accordingly). If no other request is received on the instance, the autoscaler will kick it after 15 minutes (idle activities).
To prevent that, you can set a min instance > 0, like that, a minimal number of instance are kept up and running every time (and you will also pay for it).
Or..... Cloud Run is not the right service for you, maybe Batch?

Delay on requests from Google API Gateway to Cloud Run

I'm currently seeing delays of 2-3 seconds on my first requests coming into our APIs.
We've set the min instances to 1 to prevent cold start but this a delay is still occurring.
If I check the metrics I don't see any startup latencies in the specified timeframe so I have no insights in what is causing these delays. Tracing gives the following:
The only thing I can change, is switching to "CPU is always allocated" but this isn't helping in any way.
Can somebody give more information on this?
As mentioned in the Answer :
As per doc :
Idle instances As traffic fluctuates, Cloud Run attempts to reduce the
chance of cold starts by keeping some idle instances around to handle
spikes in traffic. For example, when a container instance has finished
handling requests, it might remain idle for a period of time in case
another request needs to be handled.
Cloud Run But, Cloud Run will terminate unused containers after some
time if no requests need to be handled. This means a cold start can
still occur. Container instances are scaled as needed, and it will
initialize the execution environment completely. While you can keep
idle instances permanently available using the min-instance setting,
this incurs cost even when the service is not actively serving
requests.
So, let’s say you want to minimize both cost and response time latency
during a possible cold start. You don’t want to set a minimum number
of idle instances, but you also know any additional computation needed
upon container startup before it can start listening to requests means
longer load times and latency.
Cloud Run container startup There are a few tricks you can do to
optimize your service for container startup times. The goal here is to
minimize the latency that delays a container instance from serving
requests. But first, let’s review the Cloud Run container startup
routine.
When Starting the service
Starting the container
Running the entrypoint command to start your server
Checking for the open service port
You want to tune your service to minimize the time needed for step 1a.
Let’s walk through 3 ways to optimize your service for Cloud Run
response times.
1. Create a leaner service
2. Use a leaner base image
3. Use global variables
As mentioned in the Documentation :
Background activity is anything that happens after your HTTP response
has been delivered. To determine whether there is background activity
in your service that is not readily apparent, check your logs for
anything that is logged after the entry for the HTTP request.
Avoid background activities if CPU is allocated only during request processing
If you need to set your service to allocate CPU only during request
processing, when the Cloud Run service finishes handling a
request, the container instance's access to CPU will be disabled or
severely limited. You should not start background threads or routines
that run outside the scope of the request handlers if you use this
type of CPU allocation. Review your code to make sure all asynchronous
operations finish before you deliver your response.
Running background threads with this kind of CPU allocation can create
unpredictable behavior because any subsequent request to the same
container instance resumes any suspended background activity.
As mentioned in the Thread reason could be that all the operations you performed have happened after the response is sent.
According to the docs the CPU is allocated only during the request processing by default so the only thing you have to change is to enable CPU allocation for background activities.
You can refer to the documentation for more information related to the steps to optimize Cloud Run response times.
You can also have a look on the blog related to use of Google API Gateway with Cloud Run.

Cloud Run: 500 Server Error with no log output

We are investigating an issue on a deployed cloud run service, where requests made to the service occasionnaly fail with a StatusCodeError: 500, while no log of said requests appear on cloud run.
Served requests usually produce two log lines detailing the request, route and exit code (POST 200 on https://service-name.a.run.app/route/...)
One with log name projects/XXX/logs/run.googleapis.com/stdout is produced by our application to log the serving of every request
One with log name projects/XXX/logs/run.googleapis.com/requests is automatically produced by cloud run on every request
When the incident occurs, none of those are logged. The client (running in a gke pod in the same project) has the only log of the failing requests, with the following message:
StatusCodeError: 500 - "\n<html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>500 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered an error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>\n"
Rough timeline of the last incident:
14:41 - Service is serving requests as expected, producing both log lines each time
14:44 to 14:56 - Cloud run logs are empty, every request made to the service (~30) gets the 500 error message
14:56 - Cloud run terminates the currently running container instance, (as happens after some inactivity for instance), which is correctly logged by the application ([INFO] Handling signal: term)
14:58 - Cloud run instantiates a new container instance and starts serving incoming requests (which are logged normally)
The absence of logs during the incident makes it hard to investigate its cause, and at this stage we would be gratefull for any kind of lead.
Our service has another known issue, that may or may not be related. The service is designed to avoid multiple replicas, as a single one should be able to handle the load and serve concurrent requests (cloud run concurency = 80), but has a relatively long cold start time (~30s). This leads to 429 errors when a spike of requests comes while no replica is available (because of cloud run hard capping concurrency to 1 during cold start). This issue was somewhat mitigated by allowing some replication (currently maxScale = 3), since each replica can put a request on hold during the cold start, but will require some work on the client side to handle correctly (simple retries after the cold start).
I have found this PIT that describes the aforementioned behavior. It seems to happen because a part of Cloud Run thinks that there are already provisioned instances handling the traffic but there aren't. This issue is currently being worked on internally but there's no ETA for a fix at the moment.
The current workaround is to set a maximum number of instances to at least 4.

Restart KSQL-Server when some queries are running

I try to find some document about it, that when some queries are running and KSQL-Server restarts. What will happened?
Does it perform similar to Kafka-Streams, so the consumer offset is not committed and at-least-once is guaranteed?
I can observe that the queries stored in the command topic, and queries are executed when ksql-server restarts
I try to find some document about it, that when some queries are running and KSQL-Server restarts. What will happened?
If you only have a single KSQL server, then stopping that server will of course stop all the queries. Once the server is running again, all queries will continue from the points they stopped processing. No data is lost.
If you have multiple KSQL servers running, then stopping one (or some) of them will cause the remaining servers to take over any query processing tasks from the stopped servers. Once the stopped servers have been restarted the query processing workload will be shared again across all servers.
Does it perform similar to Kafka-Streams, so the consumer offset is not committed and at-least-once is guaranteed?
Yes.
But (even better): Whether the processing guarantees are at-least-once or exactly-once depends solely on the KSQL server's configuration. It does of course not depend on whether or when the server is being restarted, crashes, etc.

In what cases does Google Cloud Run respond with "The request failed because the HTTP connection to the instance had an error."?

We've been running Google Cloud Run for a little over a month now and noticed that we periodically have cloud run instances that simply fail with:
The request failed because the HTTP connection to the instance had an error.
This message is nearly always* proceeded by the following message (those are the only messages in the log):
This request caused a new container instance to be started and may thus take longer and use more CPU than a typical request.
* I cannot find, nor recall, a case where that isn't true, but I have not done an exhaustive search.
A few things that may be of importance:
Our concurrency level is set to 1 because our requests can take up to the maximum amount of memory available, 2GB.
We have received errors that we've exceeded the maximum memory, but we've dialed back our usage to obviate that issue.
This message appears to occur shortly after 30 seconds (e.g., 32, 35) and our timeout is set to 75 seconds.
In my case, this error was always thrown after 120 seconds from receiving the request. I figured out the issue that Node 12 default request timeout is 120 seconds. So If you are using Node server you either can change the default timeout or update Node version to 13 as they removed the default timeout https://github.com/nodejs/node/pull/27558.
If your logs didn't catch anything useful, most probably the instance crashes because you run heavy CPU tasks. A mention about this can be found on the Google Issue Tracker:
A common cause for 503 errors on Cloud Run would be when requests use
a lot of CPU and as the container is out of resources it is unable to
process some requests
For me the issue got resolved by upgrading node "FROM node:13.10.1 AS build" to "FROM node:14.10.1 AS build" in docker file it got resolved by upgarding the node.

Resources