icinga/nagios -- nrpe_check Connection refused by host - monitoring

I'm pretty new to icinga so maybe it's only a tiny problem which I don't understand....
I configured a nrpe_check command to monitor a disk on a host. This works pretty good:
nagios#icinga-server: /usr/lib/nagios/plugins/check_nrpe -H host.mydom.com -c check_smart_attributes
OK (sda) |sda_Media_Wearout_Indicator=097;16;6
As you can see, the nrpe connection is working and also the script returns the right data...
But at the icinga-web frontend it says always:
SMART attributes CRITICAL 2014-01-31 09:25:40 0d 1h 21m 6s 4/4 Connection refused by host
Does anyone could help me with this problem?
Tanks, regards
Andreas

Bloody typos...
After a long time and some holidays I solved the problem. I forgot a number at my IP address so it was 123 instead of 23...
Sorry for this bloody typo...
Regards Andreas

check if you did the right thing:
1. /etc/icinga/ host.mydom.com.cfg
2. commands.cfg

Related

WSL2 + Docker - Keep Alive Bug in TCP stack

I wonder if others noticed this issue with the WSL2 Debian implementation of TCP.
I am connecting from a Docker container running WSL2 Debian v. 20
The TCP client sends a Keep-Alive packet every second which is kind of overkill. Then after roughly 5 minutes, the client terminates the connection without any reason. Is anybody seeing this behavior?
You can reproduce this by just opening a telnet session to another host. But the behavior happens on other types of sockets too.
And before you ask, this issue is not caused by the server, it does not occur when opening the same tcp connection from other hosts.
wireshark dump of the last few seconds of the idle TCP connection
I had the same problem with Ubuntu on WSL2. An outbound ssh connection closed after a period of time if there was no activity on that connection. Particularly anoying if you were running an application that produced no screen output.
I suspect that the internal router that connects wsl to the local network dropped the idle TCP connection.
The solution was to shorten the TCP keep-alive timers in /proc/sys/net/ipv4, the following worked for me:
echo 300 > /proc/sys/net/tcp_keepalive_time
echo 45 > /proc/sys/net/tcp_keepalive_intvl
So I figured this out. Unfortunately, the WSL2 implementation of Debian seems to have this hardcoded in the stack. I tried to change the parameters of the socket open call and they didn't cause a change in the behavior.

Sidekiq can't connect to database?

I have "mariadb" set to 127.0.0.1 in my /etc/hosts file and sidekiq occasionally throws errors such as:
Mysql2::Error::ConnectionError: Unknown MySQL server host 'mariadb' (16)
The VM is not under significant load or anything like that.
Later edit: seems other gems have trouble resolving hosts too:
WARN -- : Unable to record event with remote Sentry server (Errno::EBUSY - Failed to open TCP connection to XXXX.ingest.sentry.io:443 (Device or resource busy - getaddrinfo)):
Anyone have any idea why that may happen?
I've figured this out a couple weeks ago but wanted to be sure before posting an answer.
I still can't figure out the mechanic of this issue but it was caused by fail2ban.
I had it running in a container polling the httpd logs and blocking the tremendous amount of bots scraping my sites.
Also I increased the max file handlers and inotify handlers.
fs.file-max = 131070
fs.inotify.max_user_watches = 65536
As soon as I got rid of fail2ban and increased the inotify handlers the errors disappeared.
Obviously fail2ban gets on the "do not touch" list because of this, and we've rolled out a 404/403/500 handler on application layer that pushes unknown IPs to Cloudflare.
Although this is probably an edge case I'm leaving this here in hope it helps someone at some point.

Windows Docker is routing to an unexpected IP and being lost

Look at this trace result:
>tracert -d 172.18.0.6
Tracing route to 172.18.0.6 over a maximum of 30 hops
1 2 ms 2 ms 2 ms 192.168.2.1
2 3 ms 3 ms 8 ms 10.11.7.113
3 * * * Request timed out.
You see on the second hop, it's trying to reach an IP that can not see the IP of the running docker image which is 172.18.0.6. I don't know where it is configured.
You may see my docker desktop network config here:
I already whitelisted all possible IPs in the firewall. Also, I have no problem running the images. The images see each other with no problem. But, they can't see the host either.
The Docker Gateway IP is 172.18.0.1 which is whitelisted in the firewall too.
Any help would be appreciated
If someone experiences the same problem, it looks like the problem came with upgrading to the new version of Docker Desktop which was 2.2.0.4.
Even uninstalling and reinstalling that version did not help.
So, I uninstalled the Docker Desktop and re-installed the older version 2.1.0.5. It started working again. There has to be some networking problem with the new versions.

Storm:java.lang.RuntimeException: Returned channel was actually not established

I have a storm cluster with 1 nimbus node and 3 supervisor node which are running on docker containers on AWS ec2 instances. I had a topology running with the number of workers equal to 3 and it ran perfectly fine. I stopped and removed this container and started a new one. After this, I seem to have the following error in the supervisor logs:
2016-10-03 21:18:22 b.s.m.n.Client [ERROR] connection attempt 129 to Netty-Client-hostname:6702 failed: java.lang.RuntimeException: Returned channel was actually not established
I have edited "/etc/hosts" to include the hostname as follows:
IP-address hostname
Yet, the problems seems to persist. Although, the same topology runs perfectly fine with the number of workers set to 1. Any pointers on solving this issue is appreciated.
The problem was with the hostname. I changed the hostname to match the DNS name by updating "/etc/hostname" as well as "/etc/hosts" and the rebooted nimbus instance followed by the supervisor instances. This fixed the problem. Hope this helps anyone who is stuck with the same problem!
Please check your supervisor log, sometimes you need to redeploy the app because supervisor has not started the topology yet.

tcpdump/wireshark disconnect

when i'm listening on wlan0 with tcpdump or even wireshark,
I'm always disconnected in 30s to 5 min.
Do you have any idea how to fix this?
I'm on a debian 64bits, i tryed wpa_spupplicant and network-manager.
Finally after 5 month I found how to fix this issues.
I just have to update my network card drivers (in my case, iwlwifi)

Resources