Is it ok to let a Cloud Run process continue after a HTTP response? - google-cloud-run

I am scheduling a task in Cloud Run which injects data from a Firestore export into BiqQuery using the bq command line tool. This takes a while.
I discovered that the Cloud Scheduler Attempt Deadline doesn't match Cloud Run Maximum tasks timeout. It is 30mins vs 60mins.
I don't need to display a failure in Cloud Scheduler as I can use regular error monitoring, so I plan to respond to the scheduler request before the cloud run process has finished.
Is it ok to do this, or will the Cloud Run process potentially be killed by the auto-scaling mechanism after the HTTP response has been sent?

You can read the life cycle container here in the documentation. In summary, by default the CPU(s) is only allowed to the Cloud Run service during the request processing, else it is throttled.
You can set the throttling to false to let the CPU ON on the Cloud Run instance after the response has been sent (and you will pay accordingly). If no other request is received on the instance, the autoscaler will kick it after 15 minutes (idle activities).
To prevent that, you can set a min instance > 0, like that, a minimal number of instance are kept up and running every time (and you will also pay for it).
Or..... Cloud Run is not the right service for you, maybe Batch?

Related

Properly handle timeout on CloudRun

We use Google Cloud Run to wrap an analysis developed in R behind a web API. For this, we have a small Fastify app that launches an R script and uploads the results to Google Cloud Storage. The process' stdout and stderr are written to a file and are also uploaded at the end of the analysis.
However, we sometimes run into issues when a process takes longer to execute than expected. In these cases, we fail to upload anything and it's difficult to debug, because stdout and stderr are "lost" on the instance. The only thing we see in the Cloud Run logs is this message
The request has been terminated because it has reached the maximum request timeout
Is there a recommended way to handle a request timeout?
In App Engine there used to be a descriptive error: DeadllineExceededError for Python and DeadlineExceededException for Java.
We currently evaluate the following approach
Explicitly set Cloud Run's request timeout
Provide the same value as an environment variable, so it's available to the container
When receiving a request, we start a timer that calls a "cleanup" function just before the timeout is exceeded
The cleanup function stops the running analysis and uploads the current stdout and stderr files to Cloud Storage
This feels a little complicated so any feedback very appreciated.
Since the default timeout is 5 minutes and can extend up to 60 minutes, I would simply start by increasing this to 10 minutes. Then observe over the course of a month how that affects your service.
Aside from that fix, I would start investigating why your process is taking longer than expected and if it's perhaps due to a forever-growing result set.
If there's no result set scalability concern, then bumping the default timeout up from 5-minutes seems to be the most reasonable and simple fix. It would only be a problem until your script has to deal with more data in the future for some reason.

Delay on requests from Google API Gateway to Cloud Run

I'm currently seeing delays of 2-3 seconds on my first requests coming into our APIs.
We've set the min instances to 1 to prevent cold start but this a delay is still occurring.
If I check the metrics I don't see any startup latencies in the specified timeframe so I have no insights in what is causing these delays. Tracing gives the following:
The only thing I can change, is switching to "CPU is always allocated" but this isn't helping in any way.
Can somebody give more information on this?
As mentioned in the Answer :
As per doc :
Idle instances As traffic fluctuates, Cloud Run attempts to reduce the
chance of cold starts by keeping some idle instances around to handle
spikes in traffic. For example, when a container instance has finished
handling requests, it might remain idle for a period of time in case
another request needs to be handled.
Cloud Run But, Cloud Run will terminate unused containers after some
time if no requests need to be handled. This means a cold start can
still occur. Container instances are scaled as needed, and it will
initialize the execution environment completely. While you can keep
idle instances permanently available using the min-instance setting,
this incurs cost even when the service is not actively serving
requests.
So, let’s say you want to minimize both cost and response time latency
during a possible cold start. You don’t want to set a minimum number
of idle instances, but you also know any additional computation needed
upon container startup before it can start listening to requests means
longer load times and latency.
Cloud Run container startup There are a few tricks you can do to
optimize your service for container startup times. The goal here is to
minimize the latency that delays a container instance from serving
requests. But first, let’s review the Cloud Run container startup
routine.
When Starting the service
Starting the container
Running the entrypoint command to start your server
Checking for the open service port
You want to tune your service to minimize the time needed for step 1a.
Let’s walk through 3 ways to optimize your service for Cloud Run
response times.
1. Create a leaner service
2. Use a leaner base image
3. Use global variables
As mentioned in the Documentation :
Background activity is anything that happens after your HTTP response
has been delivered. To determine whether there is background activity
in your service that is not readily apparent, check your logs for
anything that is logged after the entry for the HTTP request.
Avoid background activities if CPU is allocated only during request processing
If you need to set your service to allocate CPU only during request
processing, when the Cloud Run service finishes handling a
request, the container instance's access to CPU will be disabled or
severely limited. You should not start background threads or routines
that run outside the scope of the request handlers if you use this
type of CPU allocation. Review your code to make sure all asynchronous
operations finish before you deliver your response.
Running background threads with this kind of CPU allocation can create
unpredictable behavior because any subsequent request to the same
container instance resumes any suspended background activity.
As mentioned in the Thread reason could be that all the operations you performed have happened after the response is sent.
According to the docs the CPU is allocated only during the request processing by default so the only thing you have to change is to enable CPU allocation for background activities.
You can refer to the documentation for more information related to the steps to optimize Cloud Run response times.
You can also have a look on the blog related to use of Google API Gateway with Cloud Run.

500 on Google Cloud Run: The request failed because the instance could not start successfully

I'm doing load testing on an ExpressJS app hosted on Google Cloud Run, upon spike increase in traffic, there is a period where I see many 500 errors in Stackdriver with the message "The request failed because the instance could not start successfully." - which effectively leads to server downtime.
Seeing that this error occurs more frequently as the app scales up, I'm thinking this is caused by the Cloud Run load balancer assigning traffic prematurely to new instances, before these instances are ready to accept requests.
As I continue to run the load test, the instances are continuously and repeatedly killed and restarted, so there is no mechanism for recovery while the load is on.
I don't see any error logs from my NodeJS application, suggesting none of the failed requests actually reached my app.
What can I do to avoid these errors?
How does Cloud Run determine that a port is ready to accept requests?
Is it something I misconfigured in my ExpressJS app or can I somehow delay Cloud Run a bit before sending requests to a new instance?
This turned out to be caused by a combination of Cloud Run auto-scaling maximum instance limit and Cloud SQL's connection limit.
I was running a small Cloud SQL Postgres instance (3.75 GB / 1 vCPU) which comes with a default connection limit of 100. (https://cloud.google.com/sql/docs/quotas)
By default, Cloud Run assigns a maximum instance count of 1000 for auto-scaling. During the load test, the sudden spike in request count pushed the auto-scaling to create hundreds of instances, which quickly exhausted the Cloud SQL connection limit of 100.
This exact scenario is documented for Cloud SQL: https://cloud.google.com/sql/docs/postgres/connect-run#connection_limits_3 (it would be nice if this is also documented on Cloud Run, it did not immediately occur to me to look for documentation on Cloud SQL when this issue occurred)
The solution is a combination of limiting the maximum instance count on Cloud Run to a number that is tolerable, and adjusting resource allocation / maximum connection limit on Cloud SQL. The exact configuration would obviously depend on the expected level of load.

Google Cloud Run and golang goroutines

I'm considering Google Cloud Run for some cron-like operations I need to perform. They will get triggered by an HTTP invocation. The invocation will return (likely with a 202) and continue running in the background via a golang goroutine.
But, I'm concerned that Google Cloud Run containers are destroyed when they're not handling HTTP requests. I could be part-way through my processing and get reaped.
Is there a way to tell GCR to keep the container alive until I'm finished?
Cloud Run will scale your CPU down to nearly zero when it's not handling any requests, because you’re only paying when a request is being processed. (It's documented here).
Therefore, applications starting goroutines in the background are not suitable for Cloud Run. If you do this, your goroutines will most likely starve for CPU time shares and your program may start behaving very weirdly (as it would be running on a very very slow CPU, if anything at all).
The miniscule amount of an inactive Cloud Run application gets is probably only good for garbage collection, which go runtime will be doing for you.
If you want to wait for your goroutine to finish during the context of the request, you should block the request from returning, by using something like a blocking-receive from a chan, or sync.WaitGroup#Done().
The fairly new Always On CPU feature of Cloud Run solves this. Here is a link to the details: https://cloud.google.com/blog/products/serverless/cloud-run-gets-always-on-cpu-allocation

cloud run is closing the container even if my script is still running

I want to run a long-running job on cloud run. this task may execute more than 30 minutes and it mostly sends out API requests.
cloud run stops executing after about 20 minutes and from the metrics, it looks like it did not identify that my task is still in the running state. so it probably thinks it is in idling and closing the container. I guess I can run calls to the server while job run to keep the container alive, but is there a way to signal from to container to cloud run that job is still active and not to close the container?
I can tell it is closing the container since the logs just stop. and then, the next call I make to the cloud run endpoint, I can see the "listening" log again from the NodeJS express.
I want to run a long-running job on cloud run.
This is a red herring.
On Cloud Run, there’s no guarantee that the same container will be used. It’s a best effort.
While you don’t process requests, your CPU will be throttled to nearly 0, so what you’re trying to do right now (running a background task and trying to keep container alive by sending it requests) is not a great idea. Most likely your app model is not fit a for Cloud Run, I recommend other compute products that would let you run long-running processes as well.
According to the documentation, Cloud Run will time out after 15 minutes, and that limit can't be increased. Therefore, Cloud Run is not a very good solution for long running tasks. If you have work that needs to run for a long amount of time, consider delegating the work to Compute Engine or some other product that doesn't have time limits.
Yes, You can use.You can create an timer that call your own api after 5 minutes, so no timeout after 15 minutes.Whenever timer executes it will create a dummy request on your server.
Other option you can increase request timeout of container to 1 hour from 5 min, if your backend request gets complete in 1 hour

Resources