Docker/Airflow trigger and memory issue

Docker/Airflow trigger and memory issue - docker

I've just started using docker/airflow, and have run into the following error:
"airflow-triggerer_1 | [2022-08-26 17:25:52,620] {triggerer_job.py:347} ERROR - Triggerer's async thread was blocked for 8.37 seconds, likely by a badly-written trigger. Set PYTHONASYNCIODEBUG=1 to get more information on overrunning coroutines."
The answer to the error is supplied in the following link, which is that not enough memory has been allocated in docker - I need to provide 5 GB, and have only provided 4:
Starting airflow with Docker - trigger ERROR
However, when I go into the settings in docker, I cannot move the memory resource past 4 GB :
what I see on my docker resource screen
I'm wondering why it is that I cannot move the tab past 4 GB, and if there's anyway to work around it.
Cheers

Related

Catch X11 XIO errors

I have an OpenGL / GLFW (which uses X11) app running inside a Docker container, starting when the PC starts. I have it installed on 2 differents PCs.
On the first one (an Intel NUC Enthusiast with an RTX 2060), everything works fine.
On the second one (a Dell Precision 7920 with a Quadro A6000), it crashes when starting on PC boot, but works fine when restarting it manually a bit later. The error is the following:
XIO: fatal IO error 11 (Resource temporarily unavailable) on X server
":0" after 170 requests (170 known processed) with 0 events remaining.
It is triggered by a call to glfwPollEvents(); in my app's main loop. Seems my app is started a bit too early by Docker and that something is not started yet.
Then the app crashes with an exit code of 0, which me unable to workaround the problem using Docker's "on-failure" restart policy. Using the "always" policy works, but I'd prefer to avoid it.
So I'd like to catch this XIO exception, either to see if ignoring it for a while is enough, or to send a non-zero exit code so that Docker restarts the app until it works.
Is it possible? I have tried glfwSetErrorCallback & XSetErrorHandler, neither is called...

Maybe too late, but still:
If you get an XIO error, maybe setting the appropriate error handler could be the right thing to do.
XSetIOErrorHandler
The XSetIOErrorHandler() sets the fatal I/O error handler. Xlib calls the program's supplied error handler if any sort of system call error occurs (for example, the connection to the server was lost). This is assumed to be a fatal condition, and the called routine should not return. If the I/O error handler does return, the client process exits.

MSIX sideloaded app is slow to start after update

I am using MSIX packaging to deploy .NET desktop applications. The app is built by Azure Pipelines and the installation package is deployed to a shared folder on a file server.
When I run the .appinstaller, the dialog opens and applies updates as it should. But then the dialog closes, and nothing happens for over 1 minute. Then the app starts.
TEST 1 - Normal user
Looking in the event log, there is first this warning:
App manifest validation warning: Declared namespace
http://schemas.microsoft.com/developer/appx/2015/build is
inapplicable, it will be ignored during manifest processing.
Then several messages like
error 0x5: Deleting file \?\C:\Program
Files\WindowsApps\Deleted\8b7d5c25-92aa-4962-9e74-93b9685ce2ca-test_2021.1005.1225.1455_x64__002e9dkagpm7g28acfe13-edc2-4d9d-8a69-d5d9687e0573\MyApp\MyApp.exe
failed.
After 1 minute there is this warning:
Warning: There were 129 additional files that failed to be deleted
under the folder \?\C:\Program Files\WindowsApps\Deleted.
It seems that the process tries, and retries, to delete the old files for over 1 minute, then gives up.
How can I allow MSIX to delete the files without giving it administrator rights?
TEST 2 - Administrator user
I did a second test, this time on a different machine, and logged in as an administrator.
The update dialog finished the update and closed after 12s.
Then nothing happened for 5 minutes(!)
I believe I clicked the Start button or something, then suddenly the app started.
Examining the log did not show any warnings about failed file deletions.
Only this warning:
App manifest validation warning: Declared namespace
http://schemas.microsoft.com/developer/appx/2015/build is
inapplicable, it will be ignored during manifest processing.
During the 5 minutes there were no log entries at all.
These were the last 2 log entries, made after 5 minutes:
14-10-2021 10:10:12
UpdateUsingAppInstallerOperation operation on a package with main
parameter
8b7d5c25-92aa-4962-9e74-93b9685ce2ca-test_2021.1013.1518.1578_x64__002e9dkagpm7g
and Options 0 and 0. See http://go.microsoft.com/fwlink/?LinkId=235160
for help diagnosing app deployment issues.
14-10-2021 10:10:13
The bundle streaming reader was created successfully for bundle
8b7d5c25-92aa-4962-9e74-93b9685ce2ca-test_2021.1013.1518.1578_neutral_~_002e9dkagpm7g.Started deploymentThe bundle streaming reader was created
Conclusion
Looking at Task Manager and ProcMon, I can see that the app starts right after the update dialog closes. However, the process is a Background Process, invisible to the user.
While googling, I came across these posts describing the same problem:
https://techcommunity.microsoft.com/t5/msix-deployment/app-does-not-launch-immediately-after-installation-but-after-a/m-p/1972161
https://techcommunity.microsoft.com/t5/msix-deployment/winforms-exe-in-msix-package-does-not-startup-after-auto-update/m-p/965978

You can't give it admin rights. MSIX installations always run per user.
This seems to sound like a machine-related problem. Are you reproducing the same behavior on other machines (try virtual machines if you don't have access to a separate physical machine).

I never found a solution for this. My workaround is to turn off the update dialog by not including ShowPrompt="true".
Then the app seems to launch as it should even if there are updates.
However, there is a new problem - the first time the user starts the app after an update has been released, the auto-update does not happen. It only gets applied the second time the app starts. This is by design apparently...

Neo4j getting killed in Travis CI container

I'm trying to add Neo4j 3.0 to my tests for the neo4j gem and I'm having trouble with the server getting killed in a Travis CI container. Pre-3.0 works just fine, but when I use 3.0 it seems to get killed. There seems to be plenty of memory (when I run Neo4j locally it uses 300-400 MB). I get a warning from Neo4j saying:
WARNING: Max 30000 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
That makes me think that it's getting killed because of too many open files. I'm not sure if there's a way to increase the number of files on a Neo4j container, and I have a number of jobs so I don't want to slow things down by running sudo: true. Did Neo4j 3.0 change to require more open files (the documentation doesn't seem to imply that it did)?
EDIT:
My .travis.yml file:

This is how I do it, and it works fine for me for 2.3 and 3.0 including a push to docker hub.
https://github.com/maxdemarzi/neo_travis
https://travis-ci.org/maxdemarzi/neo_travis

I think our memory allocation is messing things up. One thing that is unusual on your (travis's) setup is that there is twice the amount of swap memory compared to RAM, and that the amount of memory reported as available is very large.
Try specifying the amount of memory in your config file. See http://neo4j.com/docs/operations-manual/current/#performance-tuning for more details, but essentially add these to your config.
In neo4j.conf:
dbms.memory.pagecache.size=1G
and in neo4j-wrapper.conf:
dbms.memory.heap.max_size=1000
dbms.memory.heap.initial_size=1000
The memory limits are set quite low to guarantee that Travis doesn't kill the process, and I suspect that the tests don't need much in terms of memory.

What does "object allocation during garbage collection phase" in my Rails log mean?

I'm trying to run a Rails app on IBM Bluemix and load test it with Blitz.io. When I access the app in my browser, everything is fine. When Blitz tries to access it, however, the app crashes. The log entry looks like this:
2014-12-20T16:26:45.55-0500 [RTR] OUT **[my app name]**.mybluemix.net - [20/12/2014:21:26:43+0000] "GET / HTTP/1.1" 200 12784 "-" "blitz.io; e970e720c4f22c94f7d822731652a745#130.160.6.54" 75.126.70.42:54311 x_forwarded_for:"-" vcap_request_id:ba32f5d0-e157-4229-61f5-13eb7ab3d2d0 response_time:2.182336949 app_id:1e6ad01b-c7b4-4f57-8d9d-8d333807bb15
2014-12-20T16:26:46.60-0500 [App/0] ERR /home/vcap/app/vendor/ruby2.0.0/lib/ruby/2.0.0/webrick/server.rb:284: [BUG] object allocation during garbage collection phase
What does this mean? I'm at a bit of a loss on how to debug this, or even where the problem lies. Is it a problem with my app code? A configuration problem?
I'm not sure whether I've included enough of the error log to be helpful here. The rest is here:
http://pastebin.com/Jv6jUksv

You can specify the Ruby version you want to run in your applications in the Gem file. The Ruby buildpack in Bluemix supports Ruby v2.1.x, v2.2 and more.
But I guess the possible cause of the error is that your app is exceeding the memory quota that's allocated to your application. Bluemix is using CloudFoundry, which will kill the app instance if it consumes more memories than allocated. You can increase the memory allocated to your application by specifying the "-m" option when you do "cf push". For example:
cf push -m 1G

you can raise ticket/ask question in developer forum on bluemix support for speedy resolution
of this issue(if ruby 2.0.0 is having issue and if advance version of the same is working fine
for you):
https://developer.ibm.com/bluemix/support/

Is there a way you can see the memory usage around the time of this error message?
I've gotten the error
[BUG] object allocation during garbage collection phase
using Ruby 1.8.7 in an environment with explicit memory restrictions (probably similar to that of IBM Bluemix) when exceeding those memory restrictions. My memory is limited by a PBS directive.
For me, the error occurs when parsing a large amount of JSON where the json gem requires more memory than the limit for this particular JSON string.

Gerrit - Application Error - Intraline difference not available due to server error

For one of our gerrit projects, while navigating the file differences we get this error:
Application Error
Intraline difference not available due to server error
[Continue]
It doesn't happen for all projects, currently we've detected the error on only one project.
I looked on Google and on the gerrit documentation. Found a reference on their source code, but don't know what causes it and how it can be resolved.
The web page with the error contains a "Continue" button. Once clicked it will take you to the file you selected, but the error is annoying.
Do you know how to fix this?

That is caused while cache the intraline difference of one file, when compared between two commits. The default timeout value is 5 seconds. If the file is huge, and computation takes longer than the timeout, the worker thread is terminated, an error message is shown, and no intraline difference is displayed for the file pair.
A solution could fix this.
Add config in gerrit.conf.
[cache "diff_intraline"]
timeout = 15000 ms # Or other time length as you want.
restart Gerrit service
run SSH command "gerrit flush-caches", using a user with ViewCaches global capability.
ssh -p port userxxx#host gerrit flush-caches
Then it would work.

Cause of the error:
It is a result of Gerrit taking too long to diff the file, and marking the diff in one of its caches as non-available.
The relevant error log is here:
[2012-06-08 11:14:08,547] WARN com.google.gerrit.server.patch.IntraLineLoader : 5000 ms timeout reached for IntraLineDiff in project xxxxxxx on commit 354dd67ad54578cf801d8cda64a4ae8484ebb0b7 for path xxxxxxx.java comparing bf9fbc21520af7bfd0841c8b9f955ca6e215b059..f6b9c7992c12cfdca253acd033966f98f70f3543. Killing IntraLineDiff-6

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart