Problem
I have an application running on a Cloud Run instance for a 5 months now.
The application has a startup time of about 3 minutes and when the startup is over it does not need much RAM.
Here are two snapshots of docker stats when I run the app locally :
When the app isn't excited
When the app is receiving 10 requests per seconds (Which is way over our use case for now) :
There aren't any problems when I run the app locally however problems arise when I deploy it on Cloud Run. I keep receiving : "OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k" messages followed by the restart of the app. This is a problem because as I said the app takes up to 3 minutes to restart, during which the requests take a lot of time to get treated.
I already fixed the cold start issue by using a minimum instance of 1 AND using a google cloud scheduler to query the service every minutes.
Examples
Here are examples of what I see in the logs.
In the second example the warnings came once again just after the application restart which caused a second restart in a row, this happens quite often.
Also note that those warnings/restarts are not necessarily happening when users are connected to the app but can happen when the only activity is due to the Google Cloud Scheduler
I tried increasing the allocated RAM and CPU to 4 CPUs and 4 Go of RAM (which is a huge over kill) and yet the problem remains.
Update 02/21
As of 01/01/21 we stopped witnessing such behavior from our cloud run service (maybe due an update, I don't know). I did contact the GCP support but they just told me to raise an issue on the OpenBLAS github repo but since I can't reproduce the behavior I did not do so. I'll leave the question open as nothing I did really worked.
OpenBLAS performs high performance compute optimizations and need to know what are the CPU capacity to tune itself the best.
However, when you run a container on Cloud Run, you run it in a sandbox GVisor, to increase the security and the isolation of all the container running on the same serverless platform.
This sandbox intercepts low level kernel calls and discard the abnormal/dangerous ones. I guess that for this reason that OpenBLAS can't determine the L2 cache size. On your environment, you haven't this sandbox, and you can access directly to the CPU info.
Why it's restart?? It could be a problem with OpenBLAS or a problem with Cloud Run (suspicious kernel call, kill the instance and restart it).
I haven't immediate solution because I don't know OpenBLAS. I had similar behavior with Tensorflow Serving, and tensorflow proposes a compiled version without any CPU optimization: less efficient but more portable and resilient to different environment constraint. If a similar compilation exists for OpenBLAS, it could be great to test it.
Related
I'm using Locust to load test my web servers. I'm running Locust in distributed mode. The worker nodes are written in Java, and use the Locust/Java port using locust4j. The master node and the worker nodes are containerized, our orchestrator is Kubernetes. When I want to spin up more workers, I'm doing it from there.
The problem that I'm running into is that no matter how many users I add, or worker nodes I add, I can't seem to generate more than ~8000 RPM. This is confirmed by the Locust web frontend, as well as the metrics I'm collecting from my web server.
Does anyone have any ideas why this is happening?
I've attached an image of timings I've collected. The snapshots are from running the load test for 60 seconds, I'm timing it from a stopwatch.
The usual culprit in these kinds of situations is your servers can't handle more than that. In my experience, the behavior you'll see client side as the servers get overwhelmed is you'll start to see a slow but steady increase in response times. This is one big reason why Locust includes those in the metrics it shows you.
Based on what I'm seeing in your screenshots, this is most likely the case for you. You have some very low minimum times but your average, median, and 90%iles are a lot higher than your minimums; your maximums are very significantly higher than those. Without seeing your charts I can't know for sure but that's a big red flag.
For more things to look out for, check out this question in the FAQ (especially see the list of server stats to investigate):
https://github.com/locustio/locust/wiki/FAQ#increase-my-request-raterps
I've got a project with 4 components, and every component has hosting set up on Google Cloud Run, separate deployments for testing and for production. I'm also using Google Cloud Build to handle the build & deployment of the components.
Due to lack of good webhook events from source system, I'm currently forced to trigger a rebuild of all components in a project every time there is a new change. In the project this means 8 different images to build and deploy, as testing and production use different build-time settings as well.
I've managed to optimize Cloud Build to handle the 8 concurrent builds pretty nicely, but they all finish around the same time, and then all 8 are pushed to Cloud Run. It often seems like Cloud Run does not like this at all and starts throwing some errors to me that I've been unable to resolve.
First and more serious is that often about 4-6 of the 8 deployments go through as expected, and the remaining ones either are significantly delayed or just fail, often so that the first few go through fine, then a few with significant delays, and the final 1-2 just fail. This seems to be caused by some "reconciliation request quota" being exhausted in the region (in this case europe-north1), as this is the error I can see at the top of the Cloud Run service -view:
Additionally and mostly annoyingly, the Cloud Run dashboard itself does not seem to handle having 8 services deployed, as just sitting on the dashboard view listing the services regularly throws me another error related to some read quotas:
I've tried contacting Google via their recommended "Send feedback" button but have received no reply in ~1wk+ (who knows when I sent it, because they don't seem to confirm receipt).
One option I can do to try and improve the situation is to deploy the "testing" and "production" variants in different regions, however that would be less than optimal, and seems like this is some simple configuration somewhere about the limits. Are there other options for me to consider? Or should I just try to set up some synchronization on these that not all deployments are fired at once?
Optimizing the need to build and deploy all components at once is not really an option in this case, since they have some shared code as well, and when that changes it would still be necessary to support this.
This is an issue with Cloud Run. Developers are expected to be able to deploy many services in parallel.
The bug should be fixed within a few days or couple of weeks.
[update] Bug should now be fixed.
Make sure to use the --async flag if you want to deploy in parrallel: gcloud run deploy $SERVICE --image --async
We have a small script that scrapes a webpage (~17 entries), and writes them to Firestore collection. For this, we deployed a service on Google Cloud Run.
The execution of this code takes ~5 seconds when tested locally using Docker Container image.
The same image when deployed to Cloud Run takes over 1 minute.
Even simple command as "Delete all Documents in a Collection", which takes 2-3 seconds locally, takes over 10 seconds when deployed on Cloud Run.
We are aware of Cold Start, and so we tested the performance of Cloud Run on the third, fourth and fifth subsequent runs, but it's still quite slow.
We also experimented with the number of CPUs, instances, concurrency, memory, using both default values as well as extreme values at both ends, but Cloud Run's performance is slow.
Is this expected? Are individual instances of Cloud Run really this weak? Can we do something to make it faster?
The problem with this slowness is that if we run our code for large number of entries, Cloud Run would eventually time out (not to mention the cost of Cloud Run per second)
Posting answer to my own question as we experimented a lot with this, and found issues in our own implementation.
In our case, the reason for super slow performance was async calls without Promises or callbacks.
What we initially missed was this section: Avoiding background activities
Our code didn't wait for the async operation to end, and responding to the request right away. The async operation then moved to background activity and took forever to finish.
Responding to comments posted, or similar questions that may arise:
1. We didn't try experiment with local by setting up a VM with same config because we figured out the cause sooner.
We are not writing anything on filesystem (yet), and operations are simple calls. But this is a good question, and we'll keep it in mind when we store/write data
I'm writing an Electron app and a few builds back testers started noticing that two electron.exe processes were consuming a lot of CPU time all the time. One pegging a CPU core and the other using about 85% of a core.
I'm certain that this was not always the case as builds several months ago didn't do this. But I'm at a loss to know how to debug what code changes may have introduced this as the code base has evolved dramatically over that time.
process.getIOCounters() reports that several gigabytes of IO is occurring every few minutes. The application is not deadlocked and everything still works it is just chewing through CPU. It happens anytime the app is open even if it is in the background without any user input. I only have windows 10 x64 systems that I've deployed this to as Electron 1.7.9 and also 1.7.5.
Based on the behavior I'm certain that this IO is interprocess communication between the render and main threads, but I'm not manually performing any IPC. I think this problem is being caused by some module we've introduced that improperly resides in the rendered thread.
My question, how does one debug the The Electron render/main thread IPC pipe? Can it be hooked to know what the contents of the gigabytes of traffic are?
Based on the past few days of attempting to debug this I've answer the question for myself:
My question, how does one debug the The Electron render/main thread IPC pipe?
Don't, electron seemed like a good idea, writing all your client and platform code in the same place. But there are a lot of catches, and out of the blue libraries will have strange bugs that are costly to address because they are outside the main stream use case. This certainly has a lot to do with me not being an Electron Expert, but in the real world there are deadlines and timelines and I can't always get up to speed as much as I would like to.
I've updated my architecture to the tried an true Service/GUI model. I'll be maintaining full browser support for the client code as well as an Electron mode with hooks for some features when electron is detected.
This allows me to quickly identify issues that are specific to browser, version or platform framework. It also lets me use which ever version of NodeJS that I would like to for the service which has also been an issue in my case.
I still love Electron though, I'm just going to be more careful as I use it. If I do discover the specifics of why I had this problem I'll check back and report those details.
Update
So this issue was not directly related to Electron like I had supposed, the IPC was not between the renderer and main threads and was a red herring. It was actually a chrome key frame animation issue which was causing a 60 FPS redraw rate, still not sure why this caused GBs of IPC, but whatever. See https://github.com/Microsoft/vscode/issues/22900
I was able to discover this by porting this app back to native browser ( with nodejs service ). I then ran in chrome, edge and firefox. Only chrome behaved this way.
Yesterday I got a trial account on webhosting.net's Jelastic v2.2.2 and configured an environment with a minimum of 0 cloudlets (max 8, i.e., all dynamic, no reserved). Then I deployed a Grails war which was using 3 cloudlets after it started up (around 350 MB). It worked great, and I was very impressed.
However, I did not access my app overnight, and the billing history shows it kept using 3 dynamic cloudlets every hour, even with 0 requests (i.e., 0 MB paid traffic) for 14 hours. Is there some way I can get my Jelastic environment to sleep (i.e., hibernation) after some period with no requests (e.g., after an hour or two)? Then, when it gets a request, I'd like it to automatically wake up (i.e., allocate some cloudlets and restore memory from disk). I see how to stop and restart it manually, but I would like it to work automatically, for any requester.
edit: I found the following documentation, but does it not work for Tomcat/Grails?
Hibernation
Jelastic’s hibernation feature delivers even better utilization of cluster resources. Optimal use of resources is achieved by suspending non-active containers and returning released resources back to the cluster.
Because they are in sleep mode, hibernated containers do not consume resources (only disk space). As a result you save money while your containers are in hibernate mode. If applications are needed again the platform returns them to a running state again in just a few seconds.
It takes a little time to awaken your environment from sleep, so it's not suitable to work how you describe for production use - you would effectively lose visitors because it would seem like your service is offline due to the delays for that first access.
For that reason the 'sleep' function is only active for trial accounts, and the inactivity time before sleep is set by the hosting provider (so you should contact them directly for help on that point).
Of course you should also remember that accesses from search engine spiders etc. may keep your environment awake.