Telegraf 1.16 + inputs.modbus plugin: timeout problem - influxdb
I am reading with Telegraf 1.16 some Janitza devices through the inputs.modbus plugin.
Telegraf is started manually and not as a service to ease tests and debugging.
This is the configuration:
Unit1 is a UMG604 that acts as a Gateway: it receives Modbus/TCP messages, and if they don't match its modbus address number, relays them to the following units. These are linked through a RS485 line. That means the communication is half-duplex and the line is quite busy because we are trying to read 350+ registers at any tick (25 registers per device).
These units are read without any problem using two loggers I wrote, one in C, the other in Python/pymodbus, so I can exclude any hardware issue.
Settings are straightforward, and here is a skeleton of Telegraf configuration file:
[agent]
interval="5s" # sample time
round_interval=true # sample at rounded intervals :00, :05, :10, etc
metric_batch_size=1000
metric_buffer_limit=10000
[[inputs.modbus]]
name = "UMG604_Gateway_unit1"
slave_id = 1
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit1", name="Strom-1", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit2"
slave_id = 2
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit2", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit3"
slave_id = 3
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit3", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit4"
slave_id = 4
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit4", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit5"
slave_id = 5
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit5", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit6"
slave_id = 6
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit6", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit7"
slave_id = 7
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit7", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[inputs.modbus]]
name = "UMG103_unit8"
slave_id = 8
timeout = "2s"
busy_retries = 10
busy_retries_wait = "200ms"
controller = "tcp://192.168.2.100:502"
holding_registers = [
{ measurement="Unit8", name="", byte_order="ABCD", data_type="FLOAT32-IEEE", scale=1.0, address=[1325,1326]},
# <another 24 non consecutive registers here>
]
[[outputs.influxdb_v2]]
urls = ["http://localhost:8086"]
token = "XXXXXXX"
organization = "demo_org"
bucket = "demo_bucket"
The problem
The first units in the config file are read quite regularly, but unit 5..8 manifest almost always a timeout:
dial tcp 192.168.2.100:502: i/o timeout
read tcp 192.168.2.XX:XXXX->192.168.2.100:502: i/o timeout
There are not so many parameters to tweak (timeout, busy_retry and busy_retry_wait has been increased), so I don't know if what I experience is a wrong setting or a problem in the modbus plugin.
Question 1. How is Telegraf reading the devices? Are the requests multithreaded, so they are ideally read at the same time?
If this is the case, the culprit could be the gateway UMG604 unit 1: it only accepts 4 connections.
Question 2: Is there a possibility to delay the reading of some of the units 2..8? If it is, I can read the first ones, then after a delay the second block, and then a third. The lack of simultainety is not an issue in my system.
As a workaround I wrote a minimal input.exec that reads and print out a JSON that is fed to Telegraf, but if possible I would like to use a standard solution based only on the standard plugin.
EDIT 1 ############################################
Alas the devices are not accessible for experimentation: I can only ssh a remote server where Telegraf is installed.
I know from experience that there are no particular issues when reading them in different times, so after some comments I assume the problem is due to UMG604 accepting only 4 incoming connection at the same time.
Reading up to 3 devices (in any combination) never generates timeouts.
Any idea about how to activate a delayed reading of some of them? It could be a test that proves the point.
EDIT 2 ############################################
A quick recap: the error IS NOT generated by UMG604 limited number of connections.
All 8 connections open correctly, 4 soon, 4 in a later time.
The problem is the Modbus/RTU reading of UMG103 units: it looks like after the first successful readings they just stop returning data. As I told, tests cannot be done in loco.
There is an error but no return code for the error from getFields (ok=false), and this means Gather function in modbus.go plugin just exits without even retrying to read again:
if err != nil {
mberr, ok := err.(*mb.ModbusError)
if ok && mberr.ExceptionCode == mb.ExceptionCodeServerDeviceBusy && retry < m.Retries {
...
time.Sleep(m.RetriesWaitTime.Duration)
continue # try again, we are inside a for loop
}
# ok is false, so we jump here!
disconnect(m)
m.isConnected = false
return err
}
Alas forcing to retry on ANY error doesn't work.
After an error I also tried closing and reopeningthe connection before retrying, but it doesn't work.
Related
Flyctl deploy fails setting up Kernel: not syncing: Attempted to kill init! exitcode=0x00000100
I am getting the following errors while deploying an app on fly.io using flyctl deploy. I have no idea how to solve them or how to troubleshoot them. Explanation Explanation Explanation fly.toml: # fly.toml file generated for julius-goddard-full-stack-open-pokedex3 on 2022-12-22T15:06:25Z app = "julius-goddard-full-stack-open-pokedex3" kill_signal = "SIGINT" kill_timeout = 5 processes = [] [build] image = "flyio/hellofly:latest" [env] [deploy] release_command = "npm run build" [processes] app = "node app.js" [experimental] allowed_public_ports = [] auto_rollback = true [[services]] http_checks = [] internal_port = 8080 processes = ["app"] protocol = "tcp" script_checks = [] [services.concurrency] hard_limit = 25 soft_limit = 20 type = "connections" [[services.ports]] force_https = true handlers = ["http"] port = 80 [[services.ports]] handlers = ["tls", "http"] port = 443 [[services.tcp_checks]] grace_period = "1s" interval = "15s" restart_limit = 0 timeout = "2 s"
Telegraf - multiple input, multiple output
I want to write every input on the relative DB (ex. input1 --> DB1, input2-->DB2) on influxdb This is my telegraf.conf # OUTPUT PLUGINS # [[outputs.influxdb]] urls = ["http://172.18.0.2:8086"] database = "shellyem" namepass = ["db1"] # OUTPUT PLUGINS # [[outputs.influxdb]] urls = ["http://172.18.0.2:8086"] database = "shell" namepass = ["db2"] # INPUT PLUGINS # [[inputs.db1]] urls = [ "http://192.168.1.191/emeter/0", ] timeout = "1s" data_format = "json" # INPUT PLUGINS # [[inputs.db2]] urls = [ "http://192.168.1.192/emeter/0", ] timeout = "1s" data_format = "json" It doesn't work because i don't understand how namepass works, can you help me? Thank you.
But it's so simple, just read for dindirindina ok copy and paste the code below [[outputs.influxdb]] urls = ["http://172.18.0.2:8086"] database = "Mirko" [outputs.influxdb.tagpass] influxdb_tag = ["Mirko"] [[outputs.influxdb]] urls = ["http://172.18.0.2:8086"] database = "Simone" [outputs.influxdb.tagpass] influxdb_tag = ["Simone"] [[inputs.http]] urls = [ "http://192.168.1.191/emeter/0", "http://192.168.1.191/emeter/1" ] data_format = "json" [inputs.http.tags] influxdb_tag = "Mirko" [[inputs.http]] urls = [ "http://192.168.1.201/emeter/0", "http://192.168.1.201/emeter/1" ] data_format = "json" [inputs.http.tags] influxdb_tag = "Simone"
Telegraf http listener v2: unable to send JSON with string values
I'm trying to send this very simple JSON string to Telegraf to be saved into InfluxDB: { "id": "id_123", "value": 10 } So the request would be this: curl -i -XPOST 'http://localhost:8080/telegraf' --data-binary '{"id": "id_123","value": 10}' When I make that request, I get the following answer: HTTP/1.1 204 No Content Date: Tue, 20 Apr 2021 13:02:49 GMT but when I check what was written to database, there is only value field: select * from http_listener_v2 time host influxdb_database value ---- ---- ----------------- ----- 1618923747863479914 my.host.com my_db 10 What am I doing wrong? Here's my Telegraf config: [global_tags] [agent] interval = "10s" round_interval = true metric_batch_size = 1000 metric_buffer_limit = 10000 collection_jitter = "0s" flush_interval = "10s" flush_jitter = "0s" precision = "" hostname = "" omit_hostname = false # OUTPUTS [[outputs.influxdb]] urls = ["http://127.0.0.1:8086"] database = "telegraf" username = "xxx" password = "xxx" [outputs.influxdb.tagdrop] influxdb_database = ["*"] [[outputs.influxdb]] urls = ["http://127.0.0.1:8086"] database = "httplistener" username = "xxx" password = "xxx" [outputs.influxdb.tagpass] influxdb_database = ["httplistener"] # INPUTS ## system [[inputs.cpu]] percpu = true totalcpu = true collect_cpu_time = false report_active = false [[inputs.disk]] ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"] [[inputs.mem]] [[inputs.swap]] [[inputs.system]] ## http listener [[inputs.http_listener_v2]] service_address = ":8080" path = "/telegraf" methods = ["POST", "PUT"] data_source = "body" data_format = "json" [inputs.http_listener_v2.tags] influxdb_database = "httplistener"
Use json_string_fields = ["id"]
Unable to get response http Post to local express app from Kapacitor stream
I am following SE Thread to get some response to HTTP POST on an express node. But unable to get any response from kapacitor. Environment I am using Windows 10 via PowerShell. I am connected to an InfluxDB internal Server which is mentioned in the kapacitor.conf and have a TICKscript to stream data via it. kapacitor.conf hostname = "134.102.97.81" data_dir = "C:\\Users\\des\\.kapacitor" skip-config-overrides = true default-retention-policy = "" [alert] persist-topics = true [http] bind-address = ":9092" auth-enabled = false log-enabled = true write-tracing = false pprof-enabled = false https-enabled = false https-certificate = "/etc/ssl/kapacitor.pem" https-private-key = "" shutdown-timeout = "10s" shared-secret = "" [replay] dir = "C:\\Users\\des\\.kapacitor\\replay" [storage] boltdb = "C:\\Users\\des\\.kapacitor\\kapacitor.db" [task] dir = "C:\\Users\\des\\.kapacitor\\tasks" snapshot-interval = "1m0s" [load] enabled = false dir = "C:\\Users\\des\\.kapacitor\\load" [[influxdb]] enabled = true name = "DB5Server" default = true urls = ["https://influxdb.internal.server.address:8086"] username = "user" password = "password" ssl-ca = "" ssl-cert = "" ssl-key = "" insecure-skip-verify = true timeout = "0s" disable-subscriptions = true subscription-protocol = "https" subscription-mode = "cluster" kapacitor-hostname = "" http-port = 0 udp-bind = "" udp-buffer = 1000 udp-read-buffer = 0 startup-timeout = "5m0s" subscriptions-sync-interval = "1m0s" [influxdb.excluded-subscriptions] _kapacitor = ["autogen"] [logging] file = "STDERR" level = "DEBUG" [config-override] enabled = true [[httppost]] endpoint = "kapacitor" url = "http://localhost:1440" headers = { Content-Type = "application/json;charset=UTF-8"} alert-template = "{\"id\": {{.ID}}}" The daemon runs without any problems. test2.tick dbrp "DBTEST"."autogen" stream |from() .measurement('humid') |alert() .info(lambda: TRUE) .post() .endpoint('kapacitor') Already defined the task .\kapacitor.exe define bc_1 -tick test2.tick Enabled it .\kapacitor.exe enable bc_1 The status shows nothing: .\kapacitor.exe show bc_1 ID: bc_1 Error: Template: Type: stream Status: enabled Executing: true Created: 13 Mar 19 15:33 CET Modified: 13 Mar 19 16:23 CET LastEnabled: 13 Mar 19 16:23 CET Databases Retention Policies: ["NIMBLE"."autogen"] TICKscript: dbrp "TESTDB"."autogen" stream |from() .measurement('humid') |alert() .info(lambda: TRUE) .post() .endpoint('kapacitor') DOT: digraph bc_1 { graph [throughput="0.00 points/s"]; stream0 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ]; stream0 -> from1 [processed="0"]; from1 [avg_exec_time_ns="0s" errors="0" working_cardinality="0" ]; from1 -> alert2 [processed="0"]; alert2 [alerts_inhibited="0" alerts_triggered="0" avg_exec_time_ns="0s" crits_triggered="0" errors="0" infos_triggered="0" oks_triggered="0" warns_triggered="0" working_cardinality="0" ]; } The Daemon logs provide this for the task ts=2019-03-13T16:25:23.640+01:00 lvl=debug msg="starting enabled task on startup" service=task_store task=bc_1 ts=2019-03-13T16:25:23.677+01:00 lvl=debug msg="starting task" service=kapacitor task_master=main task=bc_1 ts=2019-03-13T16:25:23.678+01:00 lvl=info msg="started task" service=kapacitor task_master=main task=bc_1 ts=2019-03-13T16:25:23.679+01:00 lvl=debug msg="listing dot" service=kapacitor task_master=main dot="digraph bc_1 {\nstream0 -> from1;\nfrom1 -> alert2;\n}" ts=2019-03-13T16:25:23.679+01:00 lvl=debug msg="started task during startup" service=task_store task=bc_1 ts=2019-03-13T16:25:23.680+01:00 lvl=debug msg="opened service" source=srv service=*task_store.Service ts=2019-03-13T16:25:23.680+01:00 lvl=debug msg="opening service" source=srv service=*replay.Service ts=2019-03-13T16:25:23.681+01:00 lvl=debug msg="skipping recording, metadata is already correct" service=replay recording_id=353d8417-285d-4fd9-b32f-15a82600f804 ts=2019-03-13T16:25:23.682+01:00 lvl=debug msg="skipping recording, metadata is already correct" service=replay recording_id=a8bb5c69-9f20-4f4d-8f84-109170b6f583 But I get nothing on the Express Node side. The code is exactly the same as that in the above mentioned SE thread. Any Help as to how to capture stream from Kapacitor on HTTP Post? I already have a live system that is pushing information into the dedicated database already
I was able to shift focus from stream to batch in the above query. I have documented the complete process on medium.com. Some Files: kapacitor.gen.conf hostname = "my-windows-10" data_dir = "C:\\Users\\<user>\\.kapacitor" skip-config-overrides = true default-retention-policy = "" [alert] persist-topics = true [http] bind-address = ":9092" auth-enabled = false log-enabled = true write-tracing = false pprof-enabled = false https-enabled = false https-certificate = "/etc/ssl/kapacitor.pem" https-private-key = "" shutdown-timeout = "10s" shared-secret = "" [replay] dir = "C:\\Users\\des\\.kapacitor\\replay" [storage] boltdb = "C:\\Users\\des\\.kapacitor\\kapacitor.db" [task] dir = "C:\\Users\\des\\.kapacitor\\tasks" snapshot-interval = "1m0s" [load] enabled = false dir = "C:\\Users\\des\\.kapacitor\\load" [[influxdb]] enabled = true name = "default" default = true urls = ["http://127.0.0.1:8086"] username = "" password = "" ssl-ca = "" ssl-cert = "" ssl-key = "" insecure-skip-verify = true timeout = "0s" disable-subscriptions = true subscription-protocol = "http" subscription-mode = "cluster" kapacitor-hostname = "" http-port = 0 udp-bind = "" udp-buffer = 1000 udp-read-buffer = 0 startup-timeout = "5m0s" subscriptions-sync-interval = "1m0s" [influxdb.excluded-subscriptions] _kapacitor = ["autogen"] [logging] file = "STDERR" level = "DEBUG" [config-override] enabled = true # Subsequent Section describes what this conf does [[httppost]] endpoint = "kap" url = "http://127.0.0.1:30001/kapacitor" headers = { "Content-Type" = "application/json"} TICKScript var data = batch | query('SELECT "v" FROM "telegraf_test"."autogen"."humid"') .period(5s) .every(10s) data |httpPost() .endpoint('kap') Define the Task .\kapacitor.exe define batch_test -tick .\batch_test.tick -dbrp DBTEST.autogen I suspect the hostname was michieveous where it was set to localhost previously but I set it my machine's hostname and instead used the IP address 127.0.0.1 whereever localhost was mentioned
Grails maxRows/queryTimeout warning
I seem to randomly get the warning below in my Grails 2.2.4 application. It doesn't look like it is causing any issues, but it is still concerning. I tried to prevent this warning by modifying my datasource properties in my DataSource.groovy file: dataSource { pooled = true properties { maxWait = 10000 // 10 seconds minEvictableIdleTimeMillis = 1000 * 60 * 30 // 30 minutes numTestsPerEvictionRun = 3 testOnBorrow = true testOnReturn = false testWhileIdle = false timeBetweenEvictionRunsMillis = 1000 * 60 * 30 // 30 minutes validationQuery = "SELECT 1" } } And when that didn't work I tried to set the properties in my BootStrap.groovy file: def init = { servletContext -> def ctx = Holders.getApplicationContext() def dataSource = ctx.dataSourceUnproxied println "configuring database connection pool" dataSource.setMinEvictableIdleTimeMillis(1000 * 60 * 30) dataSource.setTimeBetweenEvictionRunsMillis(1000 * 60 * 30) dataSource.setNumTestsPerEvictionRun(3) dataSource.setTestOnBorrow(true) dataSource.setTestWhileIdle(false) dataSource.setTestOnReturn(false) dataSource.setValidationQuery("SELECT 1") } Neither attempt prevents the warning. The author of this post said he had success setting the properties directly in the tomcat config, but I need a more generic solution that will work from the command line and in other servers. 2013-09-25 15:07:51,027 [http-bio-8080-exec-9] WARN jdbc.AbstractBatcher - exception clearing maxRows/queryTimeout java.sql.SQLException: org.apache.commons.dbcp.DelegatingPreparedStatement with address: "com.mysql.jdbc.JDBC4PreparedStatement#13ed0db0: EXCEPTION: com.mysql.jdbc.exceptions.jdbc4.MySQLNonTransientConnectionException: No operations allowed after statement closed." is closed. at org.apache.commons.dbcp.DelegatingStatement.checkOpen(DelegatingStatement.java:137) at org.apache.commons.dbcp.DelegatingStatement.getMaxRows(DelegatingStatement.java:237) at ace_2.DefsUploadController.upload(DefsUploadController.groovy:16) at grails.plugin.cache.web.filter.PageFragmentCachingFilter.doFilter(PageFragmentCachingFilter.java:195) at grails.plugin.cache.web.filter.AbstractFilter.doFilter(AbstractFilter.java:63) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:724)
After connections to my MySQL database timed out I would see that exact exception as well. I updated the DataSource configuration using the suggestion in the Grails docs as a guideline and I have yet to see any exceptions from closed connections. Here are my current settings: properties { initialSize=5 maxActive=50 minIdle=5 maxIdle=25 maxWait = 10000 maxAge = 10 * 60000 timeBetweenEvictionRunsMillis = 5000 minEvictableIdleTimeMillis = 60000 validationQuery = "SELECT 1" validationQueryTimeout = 3 validationInterval = 15000 testOnBorrow = true testWhileIdle = true testOnReturn = false jdbcInterceptors = "ConnectionState;StatementCache(max=200)" defaultTransactionIsolation = java.sql.Connection.TRANSACTION_READ_COMMITTED }