Java process killed due to out of memory after running job in container for an hour

Java process killed due to out of memory after running job in container for an hour - docker

I have setup a job for running automation tests in CircleCI (https://hub.docker.com/r/jiteshsojitra/docker-headless-vnc-container), it works fine but after running tests for an hour it reaches to memory limit and suddenly kills running Java/ant job. So is there any way to increase the container memory so tests can be ran for 5-6 hours in container or its paid feature?
I tried by putting - JAVA_OPTS: -Xms512m -Xmx1024m in YAML script but overall container memory size reaches to ~4GB as it looks like.
References:
https://circleci.com/gh/jiteshsojitra/zm-selenium/231
https://circleci.com/api/v1.1/project/github/jiteshsojitra/zm-selenium/231/output/106/0?file=true
Log trail:
BUILD FAILED
/headless/zm-selenium/build.xml:348: Java returned: 137
Total time: 76 minutes 26 seconds
Exited with code 137
Hint: Exit code 137 typically means the process is killed because it was running out of memory
Hint: Check if you can optimize the memory usage in your app
Hint: Max memory usage of this container is 4286337024
according to /sys/fs/cgroup/memory/memory.max_usage_in_bytes

We had this problem. It is a limit in CircleCI (or the VM, really). Only solution is to make your app use less memory.

fiskeben is right. I think your mentioned container is an fork of our consol/docker-headless-vnc-container image. So you can add in the startup script the following lines.
# set correct java startup
export _JAVA_OPTIONS="-Duser.home=$HOME -Xmx${JVM_HEAP_XMX}m"
# add docker jvm flags, can maybe removed with JDK9
export _JAVA_OPTIONS="$_JAVA_OPTIONS -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap"
Now you would be able to set the environment variable JVM_HEAP_XMX with amount of megabytes your JVM should use, e.g.
docker run -e JVM_HEAP_XMX=512 ...
If you want to determine the size dynamically take a look at that script jvm_options.sh.

Related

Stop JMeter script after a certain time in jenkins build

While running JMeterbuild in jenkin, it run as infinite mode "never stops" even though the configuration is set in JMX.
How to stop JMeter build after a certain time ?
I have tried to provide command line arguments thinking it isn't rewarding jmx configuration.
PATH/jmeter -Jjmeter.save.saveservice.output_format=xml -Jduration=60 -n -t Main.jmx -l Mainreport.csv
Expecting JMeterbuild to stop after 60 seconds but it never ends, monitored for 30 minutes.

You need to get value using ${__P(duration,)} in JMeter Scheduler's duration field:
You can add __P default value in second argument, for example, for default 30
${__P(duration,30)}

Decrease bazel memory usage

I'm using bazel on a computer with 4 GB RAM (to compile the tensorflow project). Bazel does however not take into account the amount of memory I have and spawns too many jobs causing my machine to swap and leading to a longer build time.
I already tried setting the ram_utilization_factor flag through the following lines in my ~/.bazelrc
build --ram_utilization_factor 30
test --ram_utilization_factor 30
but that did not help. How are these factors to be understood anyway? Should I just randomly try out some others?

Some other flags that might help:
--host_jvm_args can be used to set how much memory the JVM should use by setting -Xms and/or -Xmx, e.g., bazel --host_jvm_args=-Xmx4g --host_jvm_args=-Xms512m build //foo:bar (docs).
--local_resources in conjunction with the --ram_utilization_factor flag (docs).
--jobs=10 (or some other low number, it defaults to 200), e.g. bazel build --jobs=2 //foo:bar (docs).
Note that --host_jvm_args is a startup option so it goes before the command (build) and --jobs is a "normal" build option so it goes after the command.

For me, the --jobs argument from #kristina's answer worked:
bazel build --jobs=1 tensorflow:libtensorflow_all.so
Note: --jobs=1 must follow, not precede build, otherwise bazel will not recognize it. If you were to type bazel --jobs=1 build tensorflow:libtensorflow_all.so, you would get this error message:
Unknown Bazel startup option: '--jobs=1'.

Just wanted to second #sashoalm's comment that the --jobs=1 flag was what made bazel build finally work.
For reference, I'm running bazel on Lubuntu 17.04, running as a VirtualBox guest with about 1.5 GB RAM and two cores of an Intel i3 (I'm running a Thinkpad T460). I was following the O'Reilly tutorial on TensorFlow (https://www.oreilly.com/learning/dive-into-tensorflow-with-linux), and ran into trouble at the following step:
$ bazel build tensorflow/examples/label_image:label_image
Changing this to bazel build --jobs=1 tensorflow/... did the trick.

i ran into quite a few unstability that bazel build failed in my k8s cluster.
Besides --jobs=1, try this:
https://docs.bazel.build/versions/master/command-line-reference.html#flag--local_resources
E.g. --local_resources=4096,2.0,1.0

only half of logical cores run on windows server 2012

i'm using windows server 2012 on a machine with 2 processors-12 cores each, total of 24 cores.
when i look in the task manager i see 24 cores.
also when i run the command
cpu get numberofcores,numberoflogicalprocessors /format:list
on the WMIC i get:
NumberOfCores=12
NumberOfLogicalProcessors=12
NumberOfCores=12
NumberOfLogicalProcessors=12
however, in the environment variables i get:
NUMBER_OF_PROCESSORS = 12
and when i run in the CMD:
echo %NUMBER_OF_PROCESSORS% i get 12 also..
which means i get to use only half of the available processors.
ideas how to solve it?

We had exactly the same problem till now, but when I changed BIOS parameter "Node Interleaving" from Disabled to Enabled - WOW! 7z/winrar now see all and USE all cores (logical processors) and ENV variable "number of processors" is 24 now!
But it is work-around.
Other way - if you use HP DL3*0 G9 platform - there is another solution - you must change setting: "NUMA Group Size Optimization" from [Clustered - default] to [Flat]. I've got this solution from another thread at SO. That's helped me too and I think it right solution.

/home/travis/build.sh: line 41: $pid Killed (exit code 137)

In the Apache Jackrabbit Oak travis build we have a unit test that
makes the build erroring out
Running org.apache.jackrabbit.oak.plugins.segment.HeavyWriteIT
/home/travis/build.sh: line 41: 3342 Killed mvn verify -P${PROFILE} ${FIXTURES} ${SUREFIRE_SKIP}
The command "mvn verify -P${PROFILE} ${FIXTURES} ${SUREFIRE_SKIP}" exited with 137.
https://travis-ci.org/apache/jackrabbit-oak/jobs/44526993
The test code can be seen at
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/segment/HeavyWriteIT.java
What's the actual explanation for the error code? How could we
workaround/solve the issue?

Error code 137 usually comes up when a script gets killed due to exhaustion of available system resources, in this case it's very likely memory. The infrastructure this build is running on has some limitations due to the underlying virtualization that can cause these errors.
I'd recommend trying out our new infrastructure, which has more resources available and should give you more stable builds: http://blog.travis-ci.com/2014-12-17-faster-builds-with-container-based-infrastructure/

Usually Killed message means that you are out of memory. Check your limits by ulimit -a or available memory by free -m, then try to increase your stack size, e.g. ulimit -s 82768 or even more.

How do I increase timeout for a cronjob/crontab?

I have written a script that gets data from solr for which date is within the specified period, and I run the script using as a daily cron.
The problem is the cronjob does not complete the task. If I manually run the script (for the same time period), it works well. If I reduce the specified time period, the script runs from the cron as well. So my guess is cronjob is timing out while running the script is there is much data to process.
How do I increase the timeout for cronjob?
PS - 1. The script I am running in cronjob is a bash script which runs a python script.

Note that the ulimit -t solution suggested will limit the amount of CPU time used, not the amount of actual time that has passed.
From the bash manpage:
ulimit [-HSTabcdefilmnpqrstuvx [limit]]
...
-t The maximum amount of cpu time in seconds
And more importantly, cron doesn't impose any timeouts in the first place. It simply kicks off whatever process and moves on.
BTW: Sorry for posting this as an answer, but I don't have enough points to add comments yet.

You could try to use ulimit -t [number of seconds] in the cronjob before running the script.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Java process killed due to out of memory after running job in container for an hour - docker

We had this problem. It is a limit in CircleCI (or the VM, really). Only solution is to make your app use less memory.

Related

Stop JMeter script after a certain time in jenkins build

Decrease bazel memory usage

only half of logical cores run on windows server 2012

/home/travis/build.sh: line 41: $pid Killed (exit code 137)

How do I increase timeout for a cronjob/crontab?

Categories

Resources