I have a web application. The cold start time of the backend service is about 10 second which is very high. I was not able to reduce the cold start time. As a second solution, I am wondering if can requests that makes cloud run service scale up handled by already running instances. After the new scaled containers ready, new requests will be handled by scaled up containers. Does Google Cloud support that?
You have a brand new feature for that. It's Health Probe you can put on your service to detect when the instance is ready to serve traffic, or unhealthy and no new request will be routed to it.
Have a try on it, it should solve your issue!
As a second solution, I am wondering if can requests that makes cloud run service scale up handled by already running instances.
I think what you really want is min-instances. This means you always will have an instance that is ready to serve requests.
Otherwise, I don't think there is any solution that would solve the problem that you have. If new requests come in, you are going to need to scale up either way, and there is nothing around the 10 second cold start. So implement min-instances with a base-line that is appropriate for your traffic.
Related
I have a simple Spring Boot cloud run service. CPU always allocated, max concurrency 80, min instances 1.
I often see new instances being created (screenshot linked), and as far as I can tell, I have less than 2 requests per second. Memory and CPU seem fine and below 60%.
I don't know the reason why new instances are being created. I've linked to a screenshot of the dashboard. I didn't see anything in the logs that would indicate heavy usage /load.
As a side effect, every time a new instance is started, I notice a latency in the requests. Can't be sure, but it looks like the requests are routed to the newly created instance(which has a few seconds of cold start)
Any ideas appreciated. Thank you.
I'm currently seeing delays of 2-3 seconds on my first requests coming into our APIs.
We've set the min instances to 1 to prevent cold start but this a delay is still occurring.
If I check the metrics I don't see any startup latencies in the specified timeframe so I have no insights in what is causing these delays. Tracing gives the following:
The only thing I can change, is switching to "CPU is always allocated" but this isn't helping in any way.
Can somebody give more information on this?
As mentioned in the Answer :
As per doc :
Idle instances As traffic fluctuates, Cloud Run attempts to reduce the
chance of cold starts by keeping some idle instances around to handle
spikes in traffic. For example, when a container instance has finished
handling requests, it might remain idle for a period of time in case
another request needs to be handled.
Cloud Run But, Cloud Run will terminate unused containers after some
time if no requests need to be handled. This means a cold start can
still occur. Container instances are scaled as needed, and it will
initialize the execution environment completely. While you can keep
idle instances permanently available using the min-instance setting,
this incurs cost even when the service is not actively serving
requests.
So, let’s say you want to minimize both cost and response time latency
during a possible cold start. You don’t want to set a minimum number
of idle instances, but you also know any additional computation needed
upon container startup before it can start listening to requests means
longer load times and latency.
Cloud Run container startup There are a few tricks you can do to
optimize your service for container startup times. The goal here is to
minimize the latency that delays a container instance from serving
requests. But first, let’s review the Cloud Run container startup
routine.
When Starting the service
Starting the container
Running the entrypoint command to start your server
Checking for the open service port
You want to tune your service to minimize the time needed for step 1a.
Let’s walk through 3 ways to optimize your service for Cloud Run
response times.
1. Create a leaner service
2. Use a leaner base image
3. Use global variables
As mentioned in the Documentation :
Background activity is anything that happens after your HTTP response
has been delivered. To determine whether there is background activity
in your service that is not readily apparent, check your logs for
anything that is logged after the entry for the HTTP request.
Avoid background activities if CPU is allocated only during request processing
If you need to set your service to allocate CPU only during request
processing, when the Cloud Run service finishes handling a
request, the container instance's access to CPU will be disabled or
severely limited. You should not start background threads or routines
that run outside the scope of the request handlers if you use this
type of CPU allocation. Review your code to make sure all asynchronous
operations finish before you deliver your response.
Running background threads with this kind of CPU allocation can create
unpredictable behavior because any subsequent request to the same
container instance resumes any suspended background activity.
As mentioned in the Thread reason could be that all the operations you performed have happened after the response is sent.
According to the docs the CPU is allocated only during the request processing by default so the only thing you have to change is to enable CPU allocation for background activities.
You can refer to the documentation for more information related to the steps to optimize Cloud Run response times.
You can also have a look on the blog related to use of Google API Gateway with Cloud Run.
I'm considering Google Cloud Run for some cron-like operations I need to perform. They will get triggered by an HTTP invocation. The invocation will return (likely with a 202) and continue running in the background via a golang goroutine.
But, I'm concerned that Google Cloud Run containers are destroyed when they're not handling HTTP requests. I could be part-way through my processing and get reaped.
Is there a way to tell GCR to keep the container alive until I'm finished?
Cloud Run will scale your CPU down to nearly zero when it's not handling any requests, because you’re only paying when a request is being processed. (It's documented here).
Therefore, applications starting goroutines in the background are not suitable for Cloud Run. If you do this, your goroutines will most likely starve for CPU time shares and your program may start behaving very weirdly (as it would be running on a very very slow CPU, if anything at all).
The miniscule amount of an inactive Cloud Run application gets is probably only good for garbage collection, which go runtime will be doing for you.
If you want to wait for your goroutine to finish during the context of the request, you should block the request from returning, by using something like a blocking-receive from a chan, or sync.WaitGroup#Done().
The fairly new Always On CPU feature of Cloud Run solves this. Here is a link to the details: https://cloud.google.com/blog/products/serverless/cloud-run-gets-always-on-cpu-allocation
Our backend is containerised with docker for use with minikube, I was wondering if as an iOS developer I can take advantage of this by running the backend locally on my laptop rather than having to communicate with a staging cloud based environment which can often be flaky.
Am I misunderstanding how this technology works, or would this be a viable and useful case for docker in iOS development, speeding up request and response times and allowing more control over the state of the backend I am building against?
Thanks for any clarity on this idea
What you’re explaining is possible and is something I do in my day job regularly so as to possibility, yes you can do this.
The question of whether this raises any benefit is broad and depends on every individuals needs. If you are finding that your cloud instance is extremely slow at the moment and you don’t have capacity to improve its performance, a locally run docker instance could very well help with this.
One thing to keep in mind though is that any changes you make to a local instance/server in order to make the app work as expected will need to be reflected into your production instance as soon as your app goes live to the public otherwise you will see undesired behaviour due to the app relying on non-existent server configs.
We have a number of rabbitmq consumers running. And I want to make sure that every one of those processes is working.
What are the best industry applied approaches for that?
We are considering using prometheus for that, is that the right direction?
You can go the Prometheus route. Use this link to get started, which would fairly quickly set you up with the ability to monitor RabbitMQ w/ Prometheus.
You can use AlertManager to setup alerts on failed processes and top that with automations to make sure you have service continuity.
HTH.