How can I see how long my Cloud Run deployed revision took to spin up? - google-cloud-run

I deployed a Vue.js and a Kotlin server app. Cloud Run does promise to put a service to sleep if no request to it arise for a specific time. I did not opened my app for a day now. As I opened it - it was available almost immediatly. Since I know how long it takes to spin up when started locally I kinda don't trust the promise that Cloud Run really had put the app to sleep and span it up so crazy fast.
I'd love to know a way how I can really see how long it took for the spinup - also for startup improvement for the backend service.

After having the service inactive for some time, record the time when you request the service URL and request it.
Then go to the logs for the Cloud Run service, and use this filter to see the logs for the service:
resource.type="cloud_run_revision"
resource.labels.service_name="$SERVICE_NAME"
Look for the log entry with the normal app output after your request, check its time and compare it with the recorded time.

You can't know when the instance will be evicted or if it is kept in memory. It could happen quickly, or take hours or days before eviction. it's "serverless".
About the starting time, when I test, I deploy a new revision and I have a try on it. In the logging service, the first log entry of the new revision provides me the cold start duration. (Usually 300+ ms, compare to usual 20 - 50 ms with warm start).
The billing instance time is the sum of all the containers running times. A container is considered as "running" when it process request(s).

Related

Properly handle timeout on CloudRun

We use Google Cloud Run to wrap an analysis developed in R behind a web API. For this, we have a small Fastify app that launches an R script and uploads the results to Google Cloud Storage. The process' stdout and stderr are written to a file and are also uploaded at the end of the analysis.
However, we sometimes run into issues when a process takes longer to execute than expected. In these cases, we fail to upload anything and it's difficult to debug, because stdout and stderr are "lost" on the instance. The only thing we see in the Cloud Run logs is this message
The request has been terminated because it has reached the maximum request timeout
Is there a recommended way to handle a request timeout?
In App Engine there used to be a descriptive error: DeadllineExceededError for Python and DeadlineExceededException for Java.
We currently evaluate the following approach
Explicitly set Cloud Run's request timeout
Provide the same value as an environment variable, so it's available to the container
When receiving a request, we start a timer that calls a "cleanup" function just before the timeout is exceeded
The cleanup function stops the running analysis and uploads the current stdout and stderr files to Cloud Storage
This feels a little complicated so any feedback very appreciated.
Since the default timeout is 5 minutes and can extend up to 60 minutes, I would simply start by increasing this to 10 minutes. Then observe over the course of a month how that affects your service.
Aside from that fix, I would start investigating why your process is taking longer than expected and if it's perhaps due to a forever-growing result set.
If there's no result set scalability concern, then bumping the default timeout up from 5-minutes seems to be the most reasonable and simple fix. It would only be a problem until your script has to deal with more data in the future for some reason.

Strange postgres slow query every day around the same time on Heroku

I have a production app on heroku with postgres, serving a Rails API. Every day between 7-8am, there will be one or more long-running requests (about 20s or more) and it only occurs then. According to logs the time spent is in the database
There are no scheduled jobs
Backups are not scheduled anywhere close to that time
Traffic is at its lowest at that time
Memory is stable
There are other requests between the daily restart and that, so the dynos are not "cold"
It is not always the same endpoint, and it doesn't occur any other time
I'm not sure if it means anything but timezone is Singapore GMT+8, so 7-8am is right before midnight.
Has anyone else experienced this or has ideas to troubleshoot?
EDIT to add details:
You can see here (Scout APM) that it precedes the high throughput, basically before the start of the work day, so it's not due to load.
In fact there is pretty much no load at all. Neither is it the first request since server restart (at 4am). The slow request at 7:46am here was repeated at night (same url, same query string) which finished in under a second

cloud run is closing the container even if my script is still running

I want to run a long-running job on cloud run. this task may execute more than 30 minutes and it mostly sends out API requests.
cloud run stops executing after about 20 minutes and from the metrics, it looks like it did not identify that my task is still in the running state. so it probably thinks it is in idling and closing the container. I guess I can run calls to the server while job run to keep the container alive, but is there a way to signal from to container to cloud run that job is still active and not to close the container?
I can tell it is closing the container since the logs just stop. and then, the next call I make to the cloud run endpoint, I can see the "listening" log again from the NodeJS express.
I want to run a long-running job on cloud run.
This is a red herring.
On Cloud Run, there’s no guarantee that the same container will be used. It’s a best effort.
While you don’t process requests, your CPU will be throttled to nearly 0, so what you’re trying to do right now (running a background task and trying to keep container alive by sending it requests) is not a great idea. Most likely your app model is not fit a for Cloud Run, I recommend other compute products that would let you run long-running processes as well.
According to the documentation, Cloud Run will time out after 15 minutes, and that limit can't be increased. Therefore, Cloud Run is not a very good solution for long running tasks. If you have work that needs to run for a long amount of time, consider delegating the work to Compute Engine or some other product that doesn't have time limits.
Yes, You can use.You can create an timer that call your own api after 5 minutes, so no timeout after 15 minutes.Whenever timer executes it will create a dummy request on your server.
Other option you can increase request timeout of container to 1 hour from 5 min, if your backend request gets complete in 1 hour

How to get downtime on Icinga?

I'm working on a project and we are using Icinga to monitor some services. However, we need to get a downtime from some services, but I can't find it.
For an example:
My service is UP, running for 5 mins.
Suddenly, service is down.
After 10 minutes, service is up again.
Okay, how can I get the 10 minutes of down service? I mean,I know that I can get two times (last time it was up, and when it came back up), but can I get this information somewhere else?
Thanks.
You can look at the service history, either on the GUI (icingaweb2 or another one), or with a request against:
The core API ;
Livestatus.

How can I get Jelastic to sleep?

Yesterday I got a trial account on webhosting.net's Jelastic v2.2.2 and configured an environment with a minimum of 0 cloudlets (max 8, i.e., all dynamic, no reserved). Then I deployed a Grails war which was using 3 cloudlets after it started up (around 350 MB). It worked great, and I was very impressed.
However, I did not access my app overnight, and the billing history shows it kept using 3 dynamic cloudlets every hour, even with 0 requests (i.e., 0 MB paid traffic) for 14 hours. Is there some way I can get my Jelastic environment to sleep (i.e., hibernation) after some period with no requests (e.g., after an hour or two)? Then, when it gets a request, I'd like it to automatically wake up (i.e., allocate some cloudlets and restore memory from disk). I see how to stop and restart it manually, but I would like it to work automatically, for any requester.
edit: I found the following documentation, but does it not work for Tomcat/Grails?
Hibernation
Jelastic’s hibernation feature delivers even better utilization of cluster resources. Optimal use of resources is achieved by suspending non-active containers and returning released resources back to the cluster.
Because they are in sleep mode, hibernated containers do not consume resources (only disk space). As a result you save money while your containers are in hibernate mode. If applications are needed again the platform returns them to a running state again in just a few seconds.
It takes a little time to awaken your environment from sleep, so it's not suitable to work how you describe for production use - you would effectively lose visitors because it would seem like your service is offline due to the delays for that first access.
For that reason the 'sleep' function is only active for trial accounts, and the inactivity time before sleep is set by the hosting provider (so you should contact them directly for help on that point).
Of course you should also remember that accesses from search engine spiders etc. may keep your environment awake.

Resources