Nagios set higher interval for notifications - monitoring

I want to set the interval of each notification sent by my nagios server to send a 6 hour interval. But with my current set up it seems that the commands interval is set to 1 hour. Here is my default template for my servers monitoring and how I use it.
define host{
name linux-vps
use generic-host
check_period 24x7
check_interval 5
retry_interval 1
max_check_attempts 10
check_command check-host-alive
notification_period 24x7
notification_interval 360
notification_options d,r
contact_groups admins
register 0
}
define host{
use linux-vps
host_name linux-server
alias CentOS 6
address xxx.xxx.xxx.xxx
}
On the Nagios server the config says the host notification interval is set to 6 hours.

Are the alerts coming through service or host alerts? It could be that your service notification_interval is set to 1 hour.

Related

ActiveMQ Artemis / Influx Telegraf MQTT Listener - all messages (100K) delivered, but 4K messages remain in queue

I have conducted a test sending 100K persistent MQTT messages (QoS 2) to ActiveMQ Artemis. The topic has two Telegraf listeners, one on VM 85 and the other on VM 86. These listeners write data to the InfluxDB on their respective servers.
The main goal of the test is to ensure all messages delivered to VM 85 are also delivered to VM 86 even if VM 86 is down. Before executing the test both listeners connect to the broker each with a unique client ID and with clean-session = false and subscribe to the topic using QoS 2. This ensures the subscription for each is present when the messages are sent whether or not the listeners are actually active. Neither listener is connected when the test starts. The order of operations is:
Start listener on VM 85.
Send data.
Ensure messages are delivered to listener on VM 85.
Start listener on VM 86.
Ensure messages are delivered to listener on VM 86.
The good news is that all messages are delivered to the Influx DB on both VMs. However, the relevant queue for VM 86 still shows about 4.3 K messages remaining, as shown below:
If I then restart the listener on VM 86, it shows it's writing more data, as shown below:
However, the total messages in the InfluxDB correctly remains at 100K. If InfluxDB receives a duplicate record, it will overwrite it. However, the client is incrementing by one and setting the date at each increment, so this shouldn't occur, at least from the client.
I'm not clear on why this would be. Why does the the listener on VM 86 need to be restarted to completely empty the queue?
There is one parameter I haven't tried in the Telegraf plugin:
## Maximum messages to read from the broker that have not been written by an
## output. For best throughput set based on the number of metrics within
## each message and the size of the output's metric_batch_size.
##
## For example, if each message from the queue contains 10 metrics and the
## output metric_batch_size is 1000, setting this to 100 will ensure that a
## full batch is collected and the write is triggered immediately without
## waiting until the next flush_interval.
# max_undelivered_messages = 1000
It seems the batch size defaults to 1000, based on the output messages. But the maximum messages to read before output seems to be something greater, since 4.3K are output when restarted. Except that they have already been output. That's the confusing part.
Client Code:
package abc;
import java.time.Instant;
import org.eclipse.paho.client.mqttv3.MqttClient;
import org.eclipse.paho.client.mqttv3.MqttConnectOptions;
import org.eclipse.paho.client.mqttv3.MqttException;
import org.eclipse.paho.client.mqttv3.MqttMessage;
import org.eclipse.paho.client.mqttv3.MqttSecurityException;
import org.eclipse.paho.client.mqttv3.persist.MemoryPersistence;
import com.influxdb.client.domain.WritePrecision;
import com.influxdb.client.write.Point;
public class MqttPublishSample {
public static void main(String[] args) throws MqttSecurityException, MqttException, InterruptedException {
String broker = "tcp://localhost:1883";
String clientId = "JavaSample";
MemoryPersistence persistence = new MemoryPersistence();
int qos = 2;
int start = Integer.parseInt(args[0]);
int end = Integer.parseInt(args[1]);
String topic = args[2];
if (topic == null) {
topic = "testtopic/999";
}
System.out.println("start: " + start + ", end: " + end + ", topic: " + topic + " qos: " + qos);
MqttClient sampleClient = new MqttClient(broker, clientId, persistence);
MqttConnectOptions connOpts = new MqttConnectOptions();
connOpts.setCleanSession(false);
connOpts.setUserName("admin");
connOpts.setPassword("xxxxxxx".toCharArray());
System.out.println("Connecting to broker: " + broker);
sampleClient.connect(connOpts);
System.out.println("Connected");
for (int i = start; i <= end; i++) {
// print out every 1000
if (i%100 == 0) {
System.out.println("i: " + i);
}
try {
Point point = Point.measurement("temperature").addTag("machine", "unit43").addField("external", i)
.time(Instant.now(), WritePrecision.NS);
content = point.toLineProtocol();
MqttMessage message = new MqttMessage(content.getBytes());
message.setQos(qos);
sampleClient.publish(topic, message);
Thread.sleep(10);
} catch (MqttException me) {
System.out.println("reason " + me.getReasonCode());
System.out.println("msg " + me.getMessage());
System.out.println("loc " + me.getLocalizedMessage());
System.out.println("cause " + me.getCause());
System.out.println("excep " + me);
me.printStackTrace();
}
}
sampleClient.disconnect();
System.out.println("Disconnected");
}
}
Telegraph Plugin on 85:
###############################################################################
# INPUT PLUGINS #
###############################################################################
[[inputs.mqtt_consumer]]
servers = ["tcp://127.0.0.1:1883"]
## Topics that will be subscribed to.
topics = [
"testtopic/#",
]
## The message topic will be stored in a tag specified by this value. If set
## to the empty string no topic tag will be created.
# topic_tag = "topic"
## When using a QoS of 1 or 2, you should enable persistent_session to allow
## resuming unacknowledged messages.
qos = 2
persistent_session = true
## If unset, a random client ID will be generated.
client_id = "InfluxData_on_86_listen_local"
## Username and password to connect MQTT server.
username = "admin"
password = "xxxxxx"
data_format = "influx"
[[inputs.mqtt_consumer]]
servers = ["tcp://10.102.11.86:1883"]
## Topics that will be subscribed to.
topics = [
"testtopic/#",
]
## The message topic will be stored in a tag specified by this value. If set
## to the empty string no topic tag will be created.
# topic_tag = "topic"
## When using a QoS of 1 or 2, you should enable persistent_session to allow
## resuming unacknowledged messages.
qos = 2
persistent_session = true
## If unset, a random client ID will be generated.
client_id = "InfluxData_on_86_listen_85"
## Username and password to connect MQTT server.
username = "admin"
password = "xxxx"
data_format = "influx"
###############################################################################
# OUTPUT PLUGINS #
###############################################################################
[[outputs.influxdb_v2]]
## The URLs of the InfluxDB cluster nodes.
##
## Multiple URLs can be specified for a single cluster, only ONE of the
## urls will be written to each interval.
urls = ["http://127.0.0.1:8086"]
## Token for authentication.
token = "xxxx"
## Organization is the name of the organization you wish to write to.
organization = "xxxx"
# ## Destination bucket to write into.
bucket = "events"
I wasn't able to replicate this issue even initially at lower volumes, although I had it twice at 100K messages.
When i added the following parameters to the Telegraf Listener:
max_undelivered_messages = 100
It seemed to slow things down, as batches were limited to 100 according to the telegraph output.
However, when I removed it, it seemed batches where still limited to 100.
Finally, I changed the same parameter to 1000:
max_undelivered_messages = 1000
After this, message batch sizes improved to well beyond 100, as they were initially.
Furthermore, at least on the third try of 100K messages, there are no longer any messages remaining in the queue after the sequence described in the question is completed.
I'm not really sure if this change did anything, but in any case the correct amount of messages were always being received.
So, I'm marking this as answered.

Connectivity problems migrating from 1.5.3 to 2.0

I am in the process of updating an Orleans 1.5.3 application to 2.0 but the updated version fails to start properly with all silos in the cluster reporting that they cannot ping the other live nodes. I am able to deploy the 1.5.3 version to the same cloud service & virtual network with no problems so I do not believe this is a network issue.
07-02-2018 22:09:51 EventId="101021" Message="Exception getting a sending socket to endpoint S172.0.0.4:11111:268290539" Exception="System.TimeoutException: Connection to 172.0.0.4:11111 could not be established in 00:00:05
at Orleans.Runtime.SocketManager.Connect(Socket s, IPEndPoint endPoint, TimeSpan connectionTimeout) in D:\build\agent\_work\18\s\src\Orleans.Core\Messaging\SocketManager.cs:line 206
at Orleans.Runtime.SocketManager.SendingSocketCreator(IPEndPoint target) in D:\build\agent\_work\18\s\src\Orleans.Core\Messaging\SocketManager.cs:line 108
at Orleans.Runtime.LRU`2.Get(TKey key) in D:\build\agent\_work\18\s\src\Orleans.Core\Utils\LRU.cs:line 160
at Orleans.Runtime.Messaging.SiloMessageSender.GetSendingSocket(Message msg, Socket& socket, SiloAddress& targetSilo, String& error) in D:\build\agent\_work\18\s\src\Orleans.Runtime\Messaging\SiloMessageSender.cs:line 85"
07-02-2018 22:09:33 EventId="100661" Message="-Failed to get ping responses from all 2 silos that are currently listed as Active in the Membership table. Newly joining silos validate connectivity with all pre-existing silos that are listed as Active in the table and have written I Am Alive in the table in the last 00:10:00 period, before they are allowed to join the cluster. Active silos are: [[SiloAddress=S172.0.0.4:11111:268290539 SiloName=ModelEngine.Silo.GraphBuilder_IN_0 Status=Active HostName=RD0003FFA5B97C ProxyPort=30000 RoleName=Orleans.Runtime UpdateZone=0 FaultZone=0 StartTime = 2018-07-03 05:08:59.428 GMT IAmAliveTime = 2018-07-03 05:09:09.537 GMT ], [SiloAddress=S172.0.0.5:11111:268290516 SiloName=ModelEngine.Silo.GraphBuilder_IN_1 Status=Active HostName=RD0003FFA5ADDE ProxyPort=30000 RoleName=Orleans.Runtime UpdateZone=0 FaultZone=0 StartTime = 2018-07-03 05:08:36.901 GMT IAmAliveTime = 2018-07-03 05:08:47.168 GMT ]]" Exception=""
Looking at the membership table, everything looks consistent between the 1.5.3 and the 2.0 nodes with the exception of the RoleName column. For the 2.0 nodes, this column always contains "Orleans.Runtime" rather than the role name. Not sure if this is related but it is the only differnce I can find.

How can I write in InfluxDB from Gatling?

My question was already asked but I didn't succeed to solve my issue.
I don't succeed to send my data from Gatling in real time to InfluxDB.
I'm on Windows 10.
Gatling Version: 2.3.0 (the last one).
InfluxDB version: 1.3.5 (the last is 1.3.6).
My gatling.conf:
data {
writers = [console, file, graphite] # The list of DataWriters to which Gatling write simulation data (currently supported : console, file, graphite, jdbc)
console {
#light = false # When set to true, displays a light version without detailed request stats
}
file {
#bufferSize = 8192 # FileDataWriter's internal data buffer size, in bytes
}
leak {
#noActivityTimeout = 30 # Period, in seconds, for which Gatling may have no activity before considering a leak may be happening
}
graphite {
#light = false # only send the all* stats
host = "127.0.0.1" # The host where the Carbon server is located
port = "2003" # The port to which the Carbon server listens to (2003 is default for plaintext, 2004 is default for pickle)
protocol = "tcp" # The protocol used to send data to Carbon (currently supported : "tcp", "udp")
rootPathPrefix = "gatling" # The common prefix of all metrics sent to Graphite
#bufferSize = 8192 # GraphiteDataWriter's internal data buffer size, in bytes
#writeInterval = 1 # GraphiteDataWriter's write interval, in seconds
}
}
My influxdb.conf:
[http]
# Determines whether HTTP endpoint is enabled.
enabled = true
# The bind address used by the HTTP service.
bind-address = "127.0.0.1:8086"
###
### [[graphite]]
###
### Controls one or many listeners for Graphite data.
###
[[graphite]]
# Determines whether the graphite endpoint is enabled.
enabled = true
database = "gatlingdb"
# retention-policy = ""
bind-address = ":2003"
protocol = "tcp"
# consistency-level = "one"
templates = [
"gatling.*.*.*.*.measurement.simulation.request.status.field"
]
My gatlingdb database is created on InfluxDB, it stays empty.
When I try:
C:\InfluxDB-1.3.5-1>influx -host 127.0.0.1
I'm connected to InfluxDB
>USE gatlingdb
I'm connected to my database. Then:
>SHOW SERIES
and
>SELECT * FROM gatling
Don't return anything. It's empty.
Note: I put "FROM gatling" because I put that in my gatling.conf: rootPathPrefix = "gatling"
I didn't download Graphite but I saw that InfluxDB accept the graphite protocol. I assume I can send data from Gatling to InfluxDB. I certainly missed something.
I succeeded in connecting InfluxDB to Grafana and I display data from other databases. I just missed the connection between Gatling and InfluxDB.
Thanks in advance for your help, I definitely need it!
Anthony
I'm almost finished the article which shows all the steps required to create the whole monitoring infrastructure using the Gatling, Grafana and InfluxDB (btw, without Graphite installed separately) which worked very well for me.
I think I'll publish it in my blog on the blazemeter.com just in few days! So stay tuned there!
http://blazemeter.com/blog
There you will even find the ready solution to spin up everything inside the Docker.
But until this (if it is urgent for you), can share my InfluxDB config section:
[[graphite]]
enabled = true
bind-address = ":2003"
database = "graphite"
retention-policy = ""
protocol = "tcp"
batch-size = 5000
batch-pending = 10
batch-timeout = "1s"
consistency-level = "one"
separator = "."
udp-read-buffer = 0
gatling.conf:
graphite {
light = false # only send the all* stats
host = "localhost" # The host where the Carbon server is located
port = 2003 # The port to which the Carbon server listens to (2003 is default for plaintext, 2004 is default for pickle)
protocol = "tcp" # The protocol used to send data to Carbon (currently supported : "tcp", "udp")
rootPathPrefix = "gatling" # The common prefix of all metrics sent to Graphite
bufferSize = 8192 # GraphiteDataWriter's internal data buffer size, in bytes
writeInterval = 1 # GraphiteDataWriter's write interval, in seconds
}
The first thing you need to check is that InfluxDB actually accepts incoming metrics via graphite protocol. For example, during InfluxDB startup logs you should find this line:
influxdb_1 | [I] 2018-01-26T13:40:37Z Listening on TCP: [::]:2003 service=graphite addr=:2003

ESP8266 UPnP Port Forwarding - IoT [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
Is it possible to use the UPNP protocol for automatic port forwarding on the router using ESP8266?
I need to be able to access my ESP8266 module even when I am away from home.
Currently I have configured port forwarding manually in my router settings.
But in the future, in order for my project to become a commercial product, it needs to be able to do automatic port forwarding as this would be a barrier for the average user.
On the internet I found something talking about UPNP on ESP8266, but it was not about port forwarding.
Thank you very much in advance!
You can have a look at my library that I made just for that:
https://github.com/ofekp/TinyUPnP
I have an example for an IOT device (LED lights) within the package, I cannot attach the link due to low reputation.
You can have a look at the example code. All made for ESP8266.
Very simple to use, just call addPortMapping with the port you want to open, just as showed in the example.
You have to do this every 36000 (LEASE_DURATION) seconds, since UPnP is lease based protocol.
Declare:
unsigned long lastUpdateTime = 0;
TinyUPnP *tinyUPnP = new TinyUPnP(-1); // -1 means blocking, preferably, use a timeout value (ms)
Setup:
if (tinyUPnP->addPortMapping(WiFi.localIP(), LISTEN_PORT, RULE_PROTOCOL_TCP, LEASE_DURATION, FRIENDLY_NAME)) {
lastUpdateTime = millis();
}
Loop:
// update UPnP port mapping rule if needed
if ((millis() - lastUpdateTime) > (long) (0.8D * (double) (LEASE_DURATION * 1000.0))) {
Serial.print("UPnP rule is about to be revoked, renewing lease");
if (tinyUPnP->addPortMapping(WiFi.localIP(), LISTEN_PORT, RULE_PROTOCOL_TCP, LEASE_DURATION, FRIENDLY_NAME)) {
lastUpdateTime = millis();
}
}
I only checked it with my D-Link router.
To anyone interested in how the library works:
It sends an M_SEARCH message to UPnP UDP multicast address.
The gateway router will respond with a message including an HTTP header called Location.
Location is a link to an XML file containing the IGD (Internet Gateway Device) API in order to create the needed calls which will add the new port mapping to your gateway router.
One of the services that is depicted in the XML is <serviceType>urn:schemas-upnp-org:service:WANPPPConnection:1</serviceType> which is what the library is looking for.
That service will include a eventSubURL tag which is a link to your router's IGD API. (The base URL is also depicted in the same file under the tag URLBase)
Using the base URL and the WANPPPConnection link you can issue an HTTP query to the router that will add the UPnP rule.
As a side note, the service depicted in the XML also includes a SCPDURL tag which is a link to another XML that depicts commands available for the service and their parameters. The package skips this stage as I assumed the query will be similar for many routers, this may very well not be the case, though, so it is up to you to check.
From this stage the package will issue the service command using an HTTP query to the router. The actual query can be seen in the code quite clearly but for anyone interested:
Headers:
"POST " + <link to service command from XML> + " HTTP/1.1"
"Content-Type: text/xml; charset=\"utf-8\""
"SOAPAction: \"urn:schemas-upnp-org:service:WANPPPConnection:1#AddPortMapping\""
"Content-Length: " + body.length()
Body:
"<?xml version=\"1.0\"?>\r\n"
"<s:Envelope xmlns:s=\"http://schemas.xmlsoap.org/soap/envelope/\" s:encodingStyle=\"http://schemas.xmlsoap.org/soap/encoding/\">\r\n"
"<s:Body>\r\n"
"<u:AddPortMapping xmlns:u=\"urn:schemas-upnp-org:service:WANPPPConnection:1\">\r\n"
" <NewRemoteHost></NewRemoteHost>\r\n"
" <NewExternalPort>" + String(rulePort) + "</NewExternalPort>\r\n"
" <NewProtocol>" + ruleProtocol + "</NewProtocol>\r\n"
" <NewInternalPort>" + String(rulePort) + "</NewInternalPort>\r\n"
" <NewInternalClient>" + ipAddressToString(ruleIP) + "</NewInternalClient>\r\n"
" <NewEnabled>1</NewEnabled>\r\n"
" <NewPortMappingDescription>" + ruleFriendlyName + "</NewPortMappingDescription>\r\n"
" <NewLeaseDuration>" + String(ruleLeaseDuration) + "</NewLeaseDuration>\r\n"
"</u:AddPortMapping>\r\n"
"</s:Body>\r\n"
"</s:Envelope>\r\n";
I hope this helps.
I don't see why not. UPnP implements multiple profiles, the one you are interested in is named IGD (Internet Gateway Device), which most home routers implement to allow client applications on the local network (e.g Skype, uTorrent, etc.) to map ports on the router's NAT.
UPnP works over IP multicast to discover and announce devices implementing UPnP services over the address 239.255.255.250. Devices interested in such announcements subscribe to this multicast group and listen on port 1900. In fact, UPnP does not itself provide a discovery mechanism, but relies on a protocol called SSDP (Simple Service Discovery Protocol) to discover hosts on the local network.
All that's needed is an UDP socket bound to the aforementioned address and port to subscribe and publish messages on your home multicast group. You'd need to use an implementation of SSDP to discover your router, once you have discovered your router, you can send commands using UPnP wrapped around SOAP enveloppes.
There are many implementations of the UPnP IGD profile in Posix C, which you may reuse and port to the ESP 8266 (e.g MiniUPnP, gupnp-igd).

f5 LTM irule - can a pool name be generated in an irule

I need to setup a configuration for many similar environments. Each will have a different hostname that follows a pattern, e.g. env1, env2, etc.
I can use a pool per environment and a single virtual server with an irule that selects a pool based on hostname.
What I'd prefer to do is dynamically generate and select the pool name based on the requested hostname rather than listing out every pool in the switch statement. It's easier to maintain and automatically handles new environments.
The code might look like:
when HTTP_REQUEST {
pool [string tolower [HTTP:host]]
}
and each pool name matches the hostname.
Is this possible? Or is there a better method?
EDIT
I've expanded my hostname pool selection. I'm now trying to include the port number. The new rule looks like:
when HTTP_REQUEST {
set lb_port "[LB::server port]"
set hostname "[string tolower [getfield [HTTP::host] : 1]]"
log local0.info "Pool name $hostname-$lb_port-pool"
pool "$hostname-$lb_port-pool"
}
This is working, but I'm seeing no-such-pool errors in the logs because somehow a port 0 request is coming into the pool. It seems to be the first request and the followed by the request with the legitimate port.
Wed Feb 17 20:39:14 EST 2016 info tmm tmm[6519] Rule /Common/one-auto-pool-select-by-hostname-port <HTTP_REQUEST>: Pool name my.example.com-80-pool
Wed Feb 17 20:39:14 EST 2016 err tmm1 tmm[6519] 01220001 TCL error: /Common/one-auto-pool-select-by-hostname-port <HTTP_REQUEST> - no such pool: my.example.com-0-pool (line 1) invoked from within "pool "$hostname-$lb_port-pool""
Wed Feb 17 20:39:14 EST 2016 info tmm1 tmm[6519] Rule /Common/one-auto-pool-select-by-hostname-port <HTTP_REQUEST>: Pool name my.example.com-0-pool
What is causing the port 0 request? And is there any workaround? e.g. could I test for port 0 and select a default port or ignore it?
ONE MORE EDIT
Rebuilt the virtual server, and now the error has gone. The rebuild of the VS was just to rename it though. I'm fairly sure I recreated the settings exactly the same.
Yes, you can specify the pool name in a string. What you have there would work as long as you have a pool with that same name. Though it doesn't show an example of doing it this way, you can also check out the pool wiki page on DevCentral for more information.
As an aside, in my environment I generally create pools with the suffix _pool to distinguish them from other objects when looking at config files. So in my iRules, I would do something like this (essentially the same thing):
when HTTP_REQUEST {
pool "[string tolower [HTTP::host]]_pool"
}
The simple case mentioned by Michael works. I'd recommend removing the port value if present:
when HTTP_REQUEST {
pool "pool_[string tolower [getfield [HTTP::host] : 1]]_[LB::server port]"
}
Keep in mind that clients might send a partial hostname. If the DNS search path is set to example.org then the client might hit shared/ which maps to shared.example.org, but the HTTP::host header will just have shared. Some API libraries may append the port number even if it's on the default port. Simple code might not send a Host header. Malicious code might send completely bogus Host headers. You could trap these cases with catch.
You can also use a datagroup to map hostnames to pools. This allows multiple hosts to use the same pool. Sample code:
when HTTP_REQUEST {
set host [string tolower [getfield [HTTP::host] ":" 1]]
if { $host == "" } {
# if there's no Host header, pull from virtual server name
# we use: pool_<virtualserver>_PROTOCOL
set host [getfield [virtual name] _ 2]
} elseif { not ($host contains ".") } {
# if Host header does not contain a dot, assume example.org
set host $host.example.org
}
set pool [class match -value $host[HTTP::uri] starts_with dg_shared.example.org]
if { $pool ne ""} {
set matched [class match -name $host[HTTP::uri] starts_with dg_shared.example.org]
set log(matched) $matched
set log(pool) $pool
if { [catch { pool $pool } ] } {
set log(reason) "Failed to Connect to Pool"
call hsllog log
call errorpage 404 $log(reason) "https://[HTTP::host][HTTP::uri]" log
}
} else {
call errorpage 404 "No Pool Found" "https://[HTTP::host][HTTP::uri]" log
}
}
when SERVER_CONNECTED {
if {!($pool ends_with "_HTTPS") } {
SSL::disable serverside
}
}
This allows host.example.org/path1 to be on a different pool than host.example.org or host.example.org/path2 by including separate entries in the datagroup. I didn't include the hsllog and errorpage procs here. They dump the log array as well as the other passed parameters.
We then disable serverside ssl for pools that don't end in _HTTPS.
Note: As with dynamically generated pool names, the BIG-IP UI does not look inside datagroups for pool references, so the interface will allow you do delete one of these pools thinking it's not in use.
We use BigIPReport to identify orphan pools:
https://devcentral.f5.com/s/articles/bigip-report

Resources