How do "Graceful Shutdowns" work when max instances = 1? - google-cloud-run

https://cloud.google.com/run/docs/reference/container-contract
The container instance then receives a
SIGTERM signal indicating the start of a 10 second period before being
shut down (with a SIGKILL signal). During this period, the container
instance is allocated CPU and billed. If the container instance does
not catch the SIGTERM signal, it is immediately shut down.
A: If I have maximum instances set to one, what happens when a new request arrives to the Cloud Run proxy after my container process catches a SIGTERM and is in the "10 second shutdown period"?
I assume that the HTTP request would wait until the shutdown completes, and then Cloud Run will boot a new fresh container to process the HTTP request?
Is this guaranteed?
I ask because my container assumes that it is the only process mutating a network
resource, and two containers would create a race condition (or processing one more HTTP request after the SIGTERM event).
https://cloud.google.com/blog/topics/developers-practitioners/graceful-shutdowns-cloud-run-deep-dive
However, you might sometimes receive this signal before your container
will be shut down due to underlying infrastructure reasons and your
container might still have in-flight connections. The graceful
termination is therefore not always guaranteed.
B: How common would this be? Do containers always get a SIGTERM?
Could I just wait until http_requests_outstanding = 0 AND SIGTERM_has_been_triggered to run my shutdown code (at which point no further HTTP requests will be forwarded to the instance)?

If you received the SIGTERM, that means your container is out of the routable traffic. If a new request comes in, a new instance is created and the request routed to this one.
The SIGTERM is sent most of the time, when the autoscaler choose to offload the instance. However, sometimes, and it's totally fine, the underlying infrastructure doesn't choose to shut down, it's shutdown (case of outage, server physical failure, CPU disruption,... all those kinds of real/physical world issues). That time, the SIGTERM might not be sent.

Related

Delay on requests from Google API Gateway to Cloud Run

I'm currently seeing delays of 2-3 seconds on my first requests coming into our APIs.
We've set the min instances to 1 to prevent cold start but this a delay is still occurring.
If I check the metrics I don't see any startup latencies in the specified timeframe so I have no insights in what is causing these delays. Tracing gives the following:
The only thing I can change, is switching to "CPU is always allocated" but this isn't helping in any way.
Can somebody give more information on this?
As mentioned in the Answer :
As per doc :
Idle instances As traffic fluctuates, Cloud Run attempts to reduce the
chance of cold starts by keeping some idle instances around to handle
spikes in traffic. For example, when a container instance has finished
handling requests, it might remain idle for a period of time in case
another request needs to be handled.
Cloud Run But, Cloud Run will terminate unused containers after some
time if no requests need to be handled. This means a cold start can
still occur. Container instances are scaled as needed, and it will
initialize the execution environment completely. While you can keep
idle instances permanently available using the min-instance setting,
this incurs cost even when the service is not actively serving
requests.
So, let’s say you want to minimize both cost and response time latency
during a possible cold start. You don’t want to set a minimum number
of idle instances, but you also know any additional computation needed
upon container startup before it can start listening to requests means
longer load times and latency.
Cloud Run container startup There are a few tricks you can do to
optimize your service for container startup times. The goal here is to
minimize the latency that delays a container instance from serving
requests. But first, let’s review the Cloud Run container startup
routine.
When Starting the service
Starting the container
Running the entrypoint command to start your server
Checking for the open service port
You want to tune your service to minimize the time needed for step 1a.
Let’s walk through 3 ways to optimize your service for Cloud Run
response times.
1. Create a leaner service
2. Use a leaner base image
3. Use global variables
As mentioned in the Documentation :
Background activity is anything that happens after your HTTP response
has been delivered. To determine whether there is background activity
in your service that is not readily apparent, check your logs for
anything that is logged after the entry for the HTTP request.
Avoid background activities if CPU is allocated only during request processing
If you need to set your service to allocate CPU only during request
processing, when the Cloud Run service finishes handling a
request, the container instance's access to CPU will be disabled or
severely limited. You should not start background threads or routines
that run outside the scope of the request handlers if you use this
type of CPU allocation. Review your code to make sure all asynchronous
operations finish before you deliver your response.
Running background threads with this kind of CPU allocation can create
unpredictable behavior because any subsequent request to the same
container instance resumes any suspended background activity.
As mentioned in the Thread reason could be that all the operations you performed have happened after the response is sent.
According to the docs the CPU is allocated only during the request processing by default so the only thing you have to change is to enable CPU allocation for background activities.
You can refer to the documentation for more information related to the steps to optimize Cloud Run response times.
You can also have a look on the blog related to use of Google API Gateway with Cloud Run.

Restart KSQL-Server when some queries are running

I try to find some document about it, that when some queries are running and KSQL-Server restarts. What will happened?
Does it perform similar to Kafka-Streams, so the consumer offset is not committed and at-least-once is guaranteed?
I can observe that the queries stored in the command topic, and queries are executed when ksql-server restarts
I try to find some document about it, that when some queries are running and KSQL-Server restarts. What will happened?
If you only have a single KSQL server, then stopping that server will of course stop all the queries. Once the server is running again, all queries will continue from the points they stopped processing. No data is lost.
If you have multiple KSQL servers running, then stopping one (or some) of them will cause the remaining servers to take over any query processing tasks from the stopped servers. Once the stopped servers have been restarted the query processing workload will be shared again across all servers.
Does it perform similar to Kafka-Streams, so the consumer offset is not committed and at-least-once is guaranteed?
Yes.
But (even better): Whether the processing guarantees are at-least-once or exactly-once depends solely on the KSQL server's configuration. It does of course not depend on whether or when the server is being restarted, crashes, etc.

How to reliably clean up dask scheduler/worker

I'm starting up a dask cluster in an automated way by ssh-ing into a bunch of machines and running dask-worker. I noticed that I sometimes run into problems when processes from a previous experiment are still running. Wha'ts the best way to clean up after dask? killall dask-worker dask-scheduler doesn't seem to do the trick, possibly because dask somehow starts up new processes in their place.
If you start a worker with dask-worker, you will notice in ps, that it starts more than one process, because there is a "nanny" responsible for restarting the worker in the case that it somehow crashes. Also, there may be "semaphore" processes around for communicating between the two, depending on which form of process spawning you are using.
The correct way to stop all of these would be to send a SIGINT (i.e., keyboard interrupt) to the parent process. A KILL signal might not give it the chance to stop and clean up the child process(s). If some situation (e.g., ssh hangup) caused a more radical termination, or perhaps a session didn't send any stop signal at all, then you will probably have to grep the output of ps for dask-like processes and kill them all.

Override request timeout in pyramid/gunicorn

Pyramid (v1.5) application served by gunicorn (v19.1.1) behind nginx on a heroic BeagleBone Black "server".
One specific request requires significant I/O and processor time on the server (data exporting from database, formatting to xls and serving)
which results in gunicorn worker timeout and a 'Bad gateway' error returned by nginx.
Is there a practical way to handle this per request instead of increasing the global request timeout for all requests?
It is just this one specific request so I'm looking for the quickest and dirtiest solution instead of implementing a correct, asynchronous client notification protocol.
From the docs:
timeout¶
-t INT, --timeout INT
30
Workers silent for more than this many seconds are killed and restarted.
Generally set to thirty seconds. Only set this noticeably higher if you’re sure of the repercussions for sync workers. For the non sync workers it just means that the worker process is still communicating and is not tied to the length of time required to handle a single request.
graceful_timeout
--graceful-timeout INT
30
Timeout for graceful workers restart.
Generally set to thirty seconds. How max time worker can handle request after got restart signal. If the time is up worker will be force killed.
keepalive
--keep-alive INT
2
The number of seconds to wait for requests on a Keep-Alive connection.
Generally set in the 1-5 seconds range.

Erlang OTP supervisor gen_tcp - {error,eaddrinuse}

I'm failing to see how adding a supervisor to a crashing gen_tcp:listen-thread would actually restart that worker. Since crashing would render the port I wanna listen on useless for a brief moment. When a crash occur and I'm trying to manually restart my application I receive "{error,eaddrinuse}". I haven't implemented any supervisor for this worker yet since I fail to see how it would work.
How do I restart a gen_tcp:listen?
In most cases, as the listen socket is linked to the controlling process (the process that created it), a termination of this process will close the socket nicely and allow you to listen again on the same port.
For all other cases, you should pass {reuseaddr, true} option to gen_tcp:listen/2. Indeed, the listen socket of your application remains active for a brief moment after a crash, and this option allows you to reuse the address during this period.
Is the process managing the gen_tcp socket a gen_server? If so, it will make your life easier.
If it is a gen_server, add process_flag(trap_exit, true) to your init function. This makes it so that when your process "crashes", it calls the terminate/2 callback function prior to actually exiting the process. Using this method, you can manually close your listening socket in the terminate function, thereby avoiding the irritating port cleanup delay.
If you are not using a gen_server, the same principle still applies, but you must be much more explicit about catching your errors.

Resources