I occasionally (~ 1 out of 30 times) get a net::ERR_CACHE_READ_FAILURE in Chrome dev tools when loading my Electron app. I can't track down a reason for the error and I can't reproduce it consistently. Has anyone run into this problem before?
If you run multiple instances of your app, the first instance might lock the cache, which will prevent another instance from reading the cache.
Take a look at this Github issue:
You should not run multiple instances of the same app at the same time, for certain operations global locks are applied. In your case the cache database is locked by one instance and all other instances will fail to read cache.
You can use the app.requestSingleInstanceLock() API to prevent multiple instances of your application from running if that is appropriate for you.
Related
Problem
I have an application running on a Cloud Run instance for a 5 months now.
The application has a startup time of about 3 minutes and when the startup is over it does not need much RAM.
Here are two snapshots of docker stats when I run the app locally :
When the app isn't excited
When the app is receiving 10 requests per seconds (Which is way over our use case for now) :
There aren't any problems when I run the app locally however problems arise when I deploy it on Cloud Run. I keep receiving : "OpenBLAS WARNING - could not determine the L2 cache size on this system, assuming 256k" messages followed by the restart of the app. This is a problem because as I said the app takes up to 3 minutes to restart, during which the requests take a lot of time to get treated.
I already fixed the cold start issue by using a minimum instance of 1 AND using a google cloud scheduler to query the service every minutes.
Examples
Here are examples of what I see in the logs.
In the second example the warnings came once again just after the application restart which caused a second restart in a row, this happens quite often.
Also note that those warnings/restarts are not necessarily happening when users are connected to the app but can happen when the only activity is due to the Google Cloud Scheduler
I tried increasing the allocated RAM and CPU to 4 CPUs and 4 Go of RAM (which is a huge over kill) and yet the problem remains.
Update 02/21
As of 01/01/21 we stopped witnessing such behavior from our cloud run service (maybe due an update, I don't know). I did contact the GCP support but they just told me to raise an issue on the OpenBLAS github repo but since I can't reproduce the behavior I did not do so. I'll leave the question open as nothing I did really worked.
OpenBLAS performs high performance compute optimizations and need to know what are the CPU capacity to tune itself the best.
However, when you run a container on Cloud Run, you run it in a sandbox GVisor, to increase the security and the isolation of all the container running on the same serverless platform.
This sandbox intercepts low level kernel calls and discard the abnormal/dangerous ones. I guess that for this reason that OpenBLAS can't determine the L2 cache size. On your environment, you haven't this sandbox, and you can access directly to the CPU info.
Why it's restart?? It could be a problem with OpenBLAS or a problem with Cloud Run (suspicious kernel call, kill the instance and restart it).
I haven't immediate solution because I don't know OpenBLAS. I had similar behavior with Tensorflow Serving, and tensorflow proposes a compiled version without any CPU optimization: less efficient but more portable and resilient to different environment constraint. If a similar compilation exists for OpenBLAS, it could be great to test it.
I've got a project with 4 components, and every component has hosting set up on Google Cloud Run, separate deployments for testing and for production. I'm also using Google Cloud Build to handle the build & deployment of the components.
Due to lack of good webhook events from source system, I'm currently forced to trigger a rebuild of all components in a project every time there is a new change. In the project this means 8 different images to build and deploy, as testing and production use different build-time settings as well.
I've managed to optimize Cloud Build to handle the 8 concurrent builds pretty nicely, but they all finish around the same time, and then all 8 are pushed to Cloud Run. It often seems like Cloud Run does not like this at all and starts throwing some errors to me that I've been unable to resolve.
First and more serious is that often about 4-6 of the 8 deployments go through as expected, and the remaining ones either are significantly delayed or just fail, often so that the first few go through fine, then a few with significant delays, and the final 1-2 just fail. This seems to be caused by some "reconciliation request quota" being exhausted in the region (in this case europe-north1), as this is the error I can see at the top of the Cloud Run service -view:
Additionally and mostly annoyingly, the Cloud Run dashboard itself does not seem to handle having 8 services deployed, as just sitting on the dashboard view listing the services regularly throws me another error related to some read quotas:
I've tried contacting Google via their recommended "Send feedback" button but have received no reply in ~1wk+ (who knows when I sent it, because they don't seem to confirm receipt).
One option I can do to try and improve the situation is to deploy the "testing" and "production" variants in different regions, however that would be less than optimal, and seems like this is some simple configuration somewhere about the limits. Are there other options for me to consider? Or should I just try to set up some synchronization on these that not all deployments are fired at once?
Optimizing the need to build and deploy all components at once is not really an option in this case, since they have some shared code as well, and when that changes it would still be necessary to support this.
This is an issue with Cloud Run. Developers are expected to be able to deploy many services in parallel.
The bug should be fixed within a few days or couple of weeks.
[update] Bug should now be fixed.
Make sure to use the --async flag if you want to deploy in parrallel: gcloud run deploy $SERVICE --image --async
We run a TokuMX replica-set (2 instances + arbiter) with about about 120GB data (on disk) and lots of indices.
Since the upgrade to TokuMX 2.0 we noticed that restarting the SECONDARY instance always took a very long time. The database kept getting stuck at STARTUP2 for 1h+, before switching to normal mode. While the server is at STARTUP2, it's running at a continuous CPU load - we assume it's rebuilding its indices, even though it was shut down properly before.
While this is annoying, with the PRIMARY being available it caused no downtime. But recently during an extended maintenance we needed to restart both instances.
We stopped the SECONDARY first, then the PRIMARY and started them in reverse order. But this resulted in both taking the full 1h+ startup-time and therefore the replica-set was not available for this time.
Not being able to restart a possibly downed replica-set without waiting for such a long time, is a risk we'd rather not take.
Is there a way to avoid the (possible) full index-rebuild on startup?
#Chris - We are revisiting your ticket now. It may have been inadvertently closed prematurely.
#Benjamin: You may want to post this on https://groups.google.com/forum/#!forum/tokumx-user where many more TokuMX users, and the Tokutek support team, will see it.
This is a bug in TokuMX, which is causing it to load and iterate the entire oplog on startup, even if the oplog has been mostly (or entirely) replicated already. I've located and fixed this issue in my local build of TokuMX. The pull request is here: https://github.com/Tokutek/mongo/pull/1230
This has reduced my node startup times from hours to <5 seconds.
We use Unicorn to run 16 instances of a RoR app. We are implementing automated reporting with the report results emailed and/or ftp'd. The reports can take up to a few minutes to generate and so we use a threadpool.
Since we have 16 instances we don't want to have potentially 16 x #_threads connections into our database. Ideally we would have just once of the instances running the scheduled reports.
I can think of a couple of ways to do it:
1) Have one of the 16 instances somehow distinguishable from the others and this is the only instance that can run the reports. I think that this would require some coding with the unicorn api, or possibly we could use a lockfile or have a database column that has the instance number allowed to run the reports.
The disadvantage of this approach is that the instance will be included in the unicorn load balancing and so users will be on the instance while the reports are being generated. However, if the thread is working properly it shouldn't be an issue.
2) Have a separate unicorn deployment for 1 instance that runs the reports and isn't included in the apache/unicorn connection. No one will interact with this instance via the ui - it just runs the reports.
The disadvantage of this approach is that I have to remember to update this instance when deploying and it's another instance to monitor for problems.
I'd prefer #1 for support simplicity, but I'm fine with #2 too.
Does anyone have experience in this?
I originally went with the approach of using a dedicated reporting instance that ran the reports in a thread pool of size=1 (and later just a plain single thread). It appeared like it would work but when I put it under load testing I quickly found situations where activity in the main thread and the reporting thread would block or cause problems on fetches (like returning nil instead of an array)
I did some research and rails/activerecord 2 (which we're currently on) isn't threadsafe.
So now I'm going to try using the whenever gem to run the reports in a rake process. I had this working for awhile but decided against it because I didn't want to maintain an external cron (even though its configured in the app, which is nice for git).
I have N dynos for a Rails application, and I'd like to run a command on all of them. Is there a way to do it? Would running rails r "SomeRubyCode" be executed on all dynos?
I'm using a plugin which syncs with a 3rd party every M minutes. The problem is, sometimes the 3rd party service times out, and I'd like to run it again without having to wait for another M minutes to pass.
No. One off commands (those like heroku run bash) are ran on another, one-off dyno. You would need to setup some kind of pubsub/message queue that all dynos listen to to accomplish this. https://devcenter.heroku.com/articles/one-off-dynos
(Asked to turn my comment into an answer... will take this opportunity to expound.)
I don't know about the details of what your plugin needs to do to 'sync' to a 3rd-party service, but I'm going to proceed with the assumption that the plugin basically fetches some transient data which your Web application then uses somehow.
Because the syncing or fetching process occasionally fails, and your Web application relies on up-to-date data you want the option of running the 'sync' process manually. Currently, the only way to do this from the plugin itself which means you need to run some code on all dynos which, as others have pointed out, isn't currently possible.
What I've done in a previous, similar scenario (fetching analytics from an external service) is simple:
Provision and configure your Heroku app with Redis
Write a rake task that simply executes the code (that would otherwise be run by the plugin) to fetch the data, then write that data into cache
Where you would normally fetch the data in the app, first try to fetch from cache (and on a cache miss, just run the same code again—just means that the data expired from cache before it was refreshed)
I then went further and used Heroku simple scheduler to execute the said rake task every n minutes to attempt to keep the data freshly updated and always in cache (cache expiry was set to a little less than n minutes) and reduce instances of perceivable lag as the data fetch occurs. I could've set cache expiry to never or greater than n but this wasn't mission-critical.
This way, if I did want to ensure that the latest analytics were displayed, all I had to do was either a) connect to Redis and remove the item from cache, or (easier), b) just heroku run rake task.
Again—this mainly works if you're just pulling data that needs to be shared among all dynos.
This obviously doesn't work the other way around. For instance, if you had a centralized service that you wanted to periodically send metrics (say, time spent per request) to on a per-dyno basis. Can't think of an easy, elegant way to do that using Heroku (other than at real-time, with all the overhead that entails).