Telegraf 1.16 + inputs.modbus plugin: timeout problem

Telegraf 1.16 + inputs.modbus plugin: timeout problem - influxdb

I am reading with Telegraf 1.16 some Janitza devices through the inputs.modbus plugin.
Telegraf is started manually and not as a service to ease tests and debugging.
This is the configuration:
Unit1 is a UMG604 that acts as a Gateway: it receives Modbus/TCP messages, and if they don't match its modbus address number, relays them to the following units. These are linked through a RS485 line. That means the communication is half-duplex and the line is quite busy because we are trying to read 350+ registers at any tick (25 registers per device).
These units are read without any problem using two loggers I wrote, one in C, the other in Python/pymodbus, so I can exclude any hardware issue.
Settings are straightforward, and here is a skeleton of Telegraf configuration file:
[agent]
interval="5s" # sample time
round_interval=true # sample at rounded intervals :00, :05, :10, etc
metric_batch_size=1000
metric_buffer_limit=10000
[[inputs.modbus]]
name = "UMG604_Gateway_unit1"
slave_id = 1
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit1", name="Strom-1", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit2"
slave_id = 2
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit2", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit3"
slave_id = 3
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit3", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit4"
slave_id = 4
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit4", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit5"
slave_id = 5
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit5", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit6"
slave_id = 6
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit6", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit7"
slave_id = 7
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit7", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit8"
slave_id = 8
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit8", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[outputs.influxdb_v2]]
urls = ["http://localhost:8086"]
token = "XXXXXXX"
organization = "demo_org"
bucket = "demo_bucket"
The problem
The first units in the config file are read quite regularly, but unit 5..8 manifest almost always a timeout:
dial tcp 192.168.2.100:502: i/o timeout
read tcp 192.168.2.XX:XXXX->192.168.2.100:502: i/o timeout
There are not so many parameters to tweak (timeout, busy_retry and busy_retry_wait has been increased), so I don't know if what I experience is a wrong setting or a problem in the modbus plugin.
Question 1. How is Telegraf reading the devices? Are the requests multithreaded, so they are ideally read at the same time?
If this is the case, the culprit could be the gateway UMG604 unit 1: it only accepts 4 connections.
Question 2: Is there a possibility to delay the reading of some of the units 2..8? If it is, I can read the first ones, then after a delay the second block, and then a third. The lack of simultainety is not an issue in my system.
As a workaround I wrote a minimal input.exec that reads and print out a JSON that is fed to Telegraf, but if possible I would like to use a standard solution based only on the standard plugin.
EDIT 1 ############################################
Alas the devices are not accessible for experimentation: I can only ssh a remote server where Telegraf is installed.
I know from experience that there are no particular issues when reading them in different times, so after some comments I assume the problem is due to UMG604 accepting only 4 incoming connection at the same time.
Reading up to 3 devices (in any combination) never generates timeouts.
Any idea about how to activate a delayed reading of some of them? It could be a test that proves the point.
EDIT 2 ############################################
A quick recap: the error IS NOT generated by UMG604 limited number of connections.
All 8 connections open correctly, 4 soon, 4 in a later time.
The problem is the Modbus/RTU reading of UMG103 units: it looks like after the first successful readings they just stop returning data. As I told, tests cannot be done in loco.
There is an error but no return code for the error from getFields (ok=false), and this means Gather function in modbus.go plugin just exits without even retrying to read again:
if err != nil {
mberr, ok := err.(*mb.ModbusError)
if ok && mberr.ExceptionCode == mb.ExceptionCodeServerDeviceBusy && retry < m.Retries {
...
time.Sleep(m.RetriesWaitTime.Duration)
continue # try again, we are inside a for loop
}
# ok is false, so we jump here!
disconnect(m)
m.isConnected = false
return err
}
Alas forcing to retry on ANY error doesn't work.
After an error I also tried closing and reopeningthe connection before retrying, but it doesn't work.

Related

Flyctl deploy fails setting up Kernel: not syncing: Attempted to kill init! exitcode=0x00000100

I am getting the following errors while deploying an app on fly.io using flyctl deploy.
I have no idea how to solve them or how to troubleshoot them.
Explanation
Explanation
Explanation
fly.toml:
# fly.toml file generated for julius-goddard-full-stack-open-pokedex3 on 2022-12-22T15:06:25Z
app = "julius-goddard-full-stack-open-pokedex3"
kill_signal = "SIGINT"
kill_timeout = 5
processes = []
[build]
image = "flyio/hellofly:latest"
[env]
[deploy]
release_command = "npm run build"
[processes]
app = "node app.js"
[experimental]
allowed_public_ports = []
auto_rollback = true
[[services]]
http_checks = []
internal_port = 8080
processes = ["app"]
protocol = "tcp"
script_checks = []
[services.concurrency]
hard_limit = 25
soft_limit = 20
type = "connections"
[[services.ports]]
force_https = true
handlers = ["http"]
port = 80
[[services.ports]]
handlers = ["tls", "http"]
port = 443
[[services.tcp_checks]]
grace_period = "1s"
interval = "15s"
restart_limit = 0
timeout = "2
s"

Telegraf - multiple input, multiple output

I want to write every input on the relative DB (ex. input1 --> DB1, input2-->DB2) on influxdb
This is my telegraf.conf
# OUTPUT PLUGINS #
[[outputs.influxdb]]
urls = ["http://172.18.0.2:8086"]
database = "shellyem"
namepass = ["db1"]
# OUTPUT PLUGINS #
[[outputs.influxdb]]
urls = ["http://172.18.0.2:8086"]
database = "shell"
namepass = ["db2"]
# INPUT PLUGINS #
[[inputs.db1]]
urls = [
"http://192.168.1.191/emeter/0",
]
timeout = "1s"
data_format = "json"
# INPUT PLUGINS #
[[inputs.db2]]
urls = [
"http://192.168.1.192/emeter/0",
]
timeout = "1s"
data_format = "json"
It doesn't work because i don't understand how namepass works, can you help me? Thank you.

But it's so simple, just read for dindirindina
ok copy and paste the code below
[[outputs.influxdb]]
urls = ["http://172.18.0.2:8086"]
database = "Mirko"
[outputs.influxdb.tagpass]
influxdb_tag = ["Mirko"]
[[outputs.influxdb]]
urls = ["http://172.18.0.2:8086"]
database = "Simone"
[outputs.influxdb.tagpass]
influxdb_tag = ["Simone"]
[[inputs.http]]
urls = [
"http://192.168.1.191/emeter/0",
"http://192.168.1.191/emeter/1"
]
data_format = "json"
[inputs.http.tags]
influxdb_tag = "Mirko"
[[inputs.http]]
urls = [
"http://192.168.1.201/emeter/0",
"http://192.168.1.201/emeter/1"
]
data_format = "json"
[inputs.http.tags]
influxdb_tag = "Simone"

Telegraf http listener v2: unable to send JSON with string values

I'm trying to send this very simple JSON string to Telegraf to be saved into InfluxDB:
{ "id": "id_123", "value": 10 }
So the request would be this: curl -i -XPOST 'http://localhost:8080/telegraf' --data-binary '{"id": "id_123","value": 10}'
When I make that request, I get the following answer: HTTP/1.1 204 No Content Date: Tue, 20 Apr 2021 13:02:49 GMT but when I check what was written to database, there is only value field:
select * from http_listener_v2
time host influxdb_database value
---- ---- ----------------- -----
1618923747863479914 my.host.com my_db 10
What am I doing wrong?
Here's my Telegraf config:
[global_tags]
[agent]
interval = "10s"
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = "0s"
flush_interval = "10s"
flush_jitter = "0s"
precision = ""
hostname = ""
omit_hostname = false
# OUTPUTS
[[outputs.influxdb]]
urls = ["http://127.0.0.1:8086"]
database = "telegraf"
username = "xxx"
password = "xxx"
[outputs.influxdb.tagdrop]
influxdb_database = ["*"]
[[outputs.influxdb]]
urls = ["http://127.0.0.1:8086"]
database = "httplistener"
username = "xxx"
password = "xxx"
[outputs.influxdb.tagpass]
influxdb_database = ["httplistener"]
# INPUTS
## system
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.mem]]
[[inputs.swap]]
[[inputs.system]]
## http listener
[[inputs.http_listener_v2]]
service_address = ":8080"
path = "/telegraf"
methods = ["POST", "PUT"]
data_source = "body"
data_format = "json"
[inputs.http_listener_v2.tags]
influxdb_database = "httplistener"

Use json_string_fields = ["id"]

Unable to get response http Post to local express app from Kapacitor stream

I am following SE Thread to get some response to HTTP POST on an express node. But unable to get any response from kapacitor.
Environment
I am using Windows 10 via PowerShell.
I am connected to an InfluxDB internal Server which is mentioned in the kapacitor.conf and have a TICKscript to stream data via it.
kapacitor.conf
hostname = "134.102.97.81"
data_dir = "C:\\Users\\des\\.kapacitor"
skip-config-overrides = true
default-retention-policy = ""
[alert]
persist-topics = true
[http]
bind-address = ":9092"
auth-enabled = false
log-enabled = true
write-tracing = false
pprof-enabled = false
https-enabled = false
https-certificate = "/etc/ssl/kapacitor.pem"
https-private-key = ""
shutdown-timeout = "10s"
shared-secret = ""
[replay]
dir = "C:\\Users\\des\\.kapacitor\\replay"
[storage]
boltdb = "C:\\Users\\des\\.kapacitor\\kapacitor.db"
[task]
dir = "C:\\Users\\des\\.kapacitor\\tasks"
snapshot-interval = "1m0s"
[load]
enabled = false
dir = "C:\\Users\\des\\.kapacitor\\load"
[[influxdb]]
enabled = true
name = "DB5Server"
default = true
urls = ["https://influxdb.internal.server.address:8086"]
username = "user"
password = "password"
ssl-ca = ""
ssl-cert = ""
ssl-key = ""
insecure-skip-verify = true
timeout = "0s"
disable-subscriptions = true
subscription-protocol = "https"
subscription-mode = "cluster"
kapacitor-hostname = ""
http-port = 0
udp-bind = ""
udp-buffer = 1000
udp-read-buffer = 0
startup-timeout = "5m0s"
subscriptions-sync-interval = "1m0s"
[influxdb.excluded-subscriptions]
_kapacitor = ["autogen"]
[logging]
file = "STDERR"
level = "DEBUG"
[config-override]
enabled = true
[[httppost]]
endpoint = "kapacitor"
url = "http://localhost:1440"
headers = { Content-Type = "application/json;charset=UTF-8"}
alert-template = "{\"id\": {{.ID}}}"
The daemon runs without any problems.
test2.tick
dbrp "DBTEST"."autogen"
stream
|from()
.measurement('humid')
|alert()
.info(lambda: TRUE)
.post()
.endpoint('kapacitor')
Already defined the task .\kapacitor.exe define bc_1 -tick test2.tick
Enabled it .\kapacitor.exe enable bc_1
The status shows nothing:
.\kapacitor.exe show bc_1
ID: bc_1
Error:
Template:
Type: stream
Status: enabled
Executing: true
Created: 13 Mar 19 15:33 CET
Modified: 13 Mar 19 16:23 CET
LastEnabled: 13 Mar 19 16:23 CET
Databases Retention Policies: ["NIMBLE"."autogen"]
TICKscript:
dbrp "TESTDB"."autogen"
stream
|from()
.measurement('humid')
|alert()
.info(lambda: TRUE)
.post()
.endpoint('kapacitor')
DOT:
digraph bc_1 {
graph [throughput="0.00 points/s"];
stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
stream0 -> from1 [processed="0"];
from1 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ];
from1 -> alert2 [processed="0"];
alert2 [alerts_inhibited="0" alerts_triggered="0" avg_exec_time_ns="0s" crits_triggered="0" errors="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" working_cardinality="0" ];
}
The Daemon logs provide this for the task
ts=2019-03-13T16:25:23.640+01:00 lvl=debug msg="starting enabled task on startup" service=task_store task=bc_1
ts=2019-03-13T16:25:23.677+01:00 lvl=debug msg="starting task" service=kapacitor task_master=main task=bc_1
ts=2019-03-13T16:25:23.678+01:00 lvl=info msg="started task" service=kapacitor task_master=main task=bc_1
ts=2019-03-13T16:25:23.679+01:00 lvl=debug msg="listing dot" service=kapacitor task_master=main dot="digraph bc_1 {\nstream0 -> from1;\nfrom1 -> alert2;\n}"
ts=2019-03-13T16:25:23.679+01:00 lvl=debug msg="started task during startup" service=task_store task=bc_1
ts=2019-03-13T16:25:23.680+01:00 lvl=debug msg="opened service" source=srv service=*task_store.Service
ts=2019-03-13T16:25:23.680+01:00 lvl=debug msg="opening service" source=srv service=*replay.Service
ts=2019-03-13T16:25:23.681+01:00 lvl=debug msg="skipping recording, metadata is already correct" service=replay recording_id=353d8417-285d-4fd9-b32f-15a82600f804
ts=2019-03-13T16:25:23.682+01:00 lvl=debug msg="skipping recording, metadata is already correct" service=replay recording_id=a8bb5c69-9f20-4f4d-8f84-109170b6f583
But I get nothing on the Express Node side. The code is exactly the same as that in the above mentioned SE thread.
Any Help as to how to capture stream from Kapacitor on HTTP Post? I already have a live system that is pushing information into the dedicated database already

I was able to shift focus from stream to batch in the above query. I have documented the complete process on medium.com.
Some Files:
kapacitor.gen.conf
hostname = "my-windows-10"
data_dir = "C:\\Users\\<user>\\.kapacitor"
skip-config-overrides = true
default-retention-policy = ""
[alert]
persist-topics = true
[http]
bind-address = ":9092"
auth-enabled = false
log-enabled = true
write-tracing = false
pprof-enabled = false
https-enabled = false
https-certificate = "/etc/ssl/kapacitor.pem"
https-private-key = ""
shutdown-timeout = "10s"
shared-secret = ""
[replay]
dir = "C:\\Users\\des\\.kapacitor\\replay"
[storage]
boltdb = "C:\\Users\\des\\.kapacitor\\kapacitor.db"
[task]
dir = "C:\\Users\\des\\.kapacitor\\tasks"
snapshot-interval = "1m0s"
[load]
enabled = false
dir = "C:\\Users\\des\\.kapacitor\\load"
[[influxdb]]
enabled = true
name = "default"
default = true
urls = ["http://127.0.0.1:8086"]
username = ""
password = ""
ssl-ca = ""
ssl-cert = ""
ssl-key = ""
insecure-skip-verify = true
timeout = "0s"
disable-subscriptions = true
subscription-protocol = "http"
subscription-mode = "cluster"
kapacitor-hostname = ""
http-port = 0
udp-bind = ""
udp-buffer = 1000
udp-read-buffer = 0
startup-timeout = "5m0s"
subscriptions-sync-interval = "1m0s"
[influxdb.excluded-subscriptions]
_kapacitor = ["autogen"]
[logging]
file = "STDERR"
level = "DEBUG"
[config-override]
enabled = true
# Subsequent Section describes what this conf does
[[httppost]]
endpoint = "kap"
url = "http://127.0.0.1:30001/kapacitor"
headers = { "Content-Type" = "application/json"}
TICKScript
var data = batch
| query('SELECT "v" FROM "telegraf_test"."autogen"."humid"')
.period(5s)
.every(10s)
data
|httpPost()
.endpoint('kap')
Define the Task
.\kapacitor.exe define batch_test -tick .\batch_test.tick -dbrp DBTEST.autogen
I suspect the hostname was michieveous where it was set to localhost previously but I set it my machine's hostname and instead used the IP address 127.0.0.1 whereever localhost was mentioned

Grails maxRows/queryTimeout warning

I seem to randomly get the warning below in my Grails 2.2.4 application. It doesn't look like it is causing any issues, but it is still concerning.
I tried to prevent this warning by modifying my datasource properties in my DataSource.groovy file:
dataSource {
pooled = true
properties {
maxWait = 10000 // 10 seconds
minEvictableIdleTimeMillis = 1000 * 60 * 30 // 30 minutes
numTestsPerEvictionRun = 3
testOnBorrow = true
testOnReturn = false
testWhileIdle = false
timeBetweenEvictionRunsMillis = 1000 * 60 * 30 // 30 minutes
validationQuery = "SELECT 1"
}
}
And when that didn't work I tried to set the properties in my BootStrap.groovy file:
def init = { servletContext ->
def ctx = Holders.getApplicationContext()
def dataSource = ctx.dataSourceUnproxied
println "configuring database connection pool"
dataSource.setMinEvictableIdleTimeMillis(1000 * 60 * 30)
dataSource.setTimeBetweenEvictionRunsMillis(1000 * 60 * 30)
dataSource.setNumTestsPerEvictionRun(3)
dataSource.setTestOnBorrow(true)
dataSource.setTestWhileIdle(false)
dataSource.setTestOnReturn(false)
dataSource.setValidationQuery("SELECT 1")
}
Neither attempt prevents the warning. The author of this post said he had success setting the properties directly in the tomcat config, but I need a more generic solution that will work from the command line and in other servers.
2013-09-25 15:07:51,027 [http-bio-8080-exec-9] WARN jdbc.AbstractBatcher - exception clearing maxRows/queryTimeout
java.sql.SQLException: org.apache.commons.dbcp.DelegatingPreparedStatement with address: "com.mysql.jdbc.JDBC4PreparedStatement#13ed0db0: EXCEPTION: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after statement closed." is closed.
at org.apache.commons.dbcp.DelegatingStatement.checkOpen(DelegatingStatement.java:137)
at org.apache.commons.dbcp.DelegatingStatement.getMaxRows(DelegatingStatement.java:237)
at ace_2.DefsUploadController.upload(DefsUploadController.groovy:16)
at grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:195)
at grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:724)

After connections to my MySQL database timed out I would see that exact exception as well. I updated the DataSource configuration using the suggestion in the Grails docs as a guideline and I have yet to see any exceptions from closed connections.
Here are my current settings:
properties {
initialSize=5
maxActive=50
minIdle=5
maxIdle=25
maxWait = 10000
maxAge = 10 * 60000
timeBetweenEvictionRunsMillis = 5000
minEvictableIdleTimeMillis = 60000
validationQuery = "SELECT 1"
validationQueryTimeout = 3
validationInterval = 15000
testOnBorrow = true
testWhileIdle = true
testOnReturn = false
jdbcInterceptors = "ConnectionState;StatementCache(max=200)"
defaultTransactionIsolation = java.sql.Connection.TRANSACTION_READ_COMMITTED
}

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Telegraf 1.16 + inputs.modbus plugin: timeout problem - influxdb

Related

Flyctl deploy fails setting up Kernel: not syncing: Attempted to kill init! exitcode=0x00000100

Telegraf - multiple input, multiple output

Telegraf http listener v2: unable to send JSON with string values

Unable to get response http Post to local express app from Kapacitor stream

Grails maxRows/queryTimeout warning

Categories

Resources