"statsd" prefix on data when using Telegraf with StatsD - influxdb

I am using Telegraf as a server to collect StatsD data from Python and send it to InfluxDB. However, the name values I receive in InfluxDB have the prefix "statsd_". How can I remove it?
In python I do:
ctr_name = 'foo'
client = statsd.StatsClient('MY_DOMAIN.com', 8125)
client.incr(ctr_name, 1)
And then in InfluxDB I see:
> show measurements
name: measurements
------------------
name
statsd_foo

By default, the statsd python package does not put a prefix in front of your measurements.
Below is the default
client = statsd.StatsClient('MY_DOMAIN.com', 8125, prefix=None)
I am pretty sure you set prefix="statsd" in your actual code
The problem does not seem to come from the statsd plugin configuration in Telegraf since it doesnt offer to prefix all measurements with a string constant. See here

Related

Is any way to start go_binary before java_test?

Our project has a few GRPC servers defined as go_binary targets. We develop client SDKs for Java and Python applications and we would like to use java_test and py_test. Is any way to start a specific go_binary target before java_test or py_test?
You can create a test harness that starts the gRPC server before running the tests. For example, you could add the binary to the data attribute of the test, and then started it beforehand:
go_binary(
name = "my_grpc_server",
[...]
)
py_test(
name = "my_test",
[...]
data = [":my_grpc_server"],
)
and then inside the test file:
class ClientTestCase(unittest.TestCase):
def setUp(self):
r = runfiles.Create()
self.server = subprocess.Popen([r.Rlocation("path/to/my_grpc_server")])
def tearDown(self):
self.server.terminate()
self.server.wait()
This example is very simple, you'll probably run into issues regarding the availability of the port the server listens on, or waiting for the server to start up. You could add flags to your gRPC server to allow communication over a domain socket, or make it listen on an unused port and have the test parse the port number from the server's log output.
For details on finding the server with runfiles: https://github.com/bazelbuild/bazel/blob/a7a0d48fbeb059ee60e77580e5d05baeefdd5699/tools/python/runfiles/runfiles.py#L16-L58
If you find yourself copy-pasting this pattern a lot, or having to implement it in multiple languages, you could try using an sh_test() rule to wrap the underlying py_test or java_test, and to start the server, then start the test with an environment variable telling it how to reach the server (eg MY_GRPC_SERVER_ADDRESS=localhost:${test_port}.

TICK stack - adding multiple influxdb sources to chronograf / kapacitor

This is my first time posting to stackoverflow, so I apologize in advance if I am not following certain protocols. I will fix and / or expand my question as needed.
I am trying to add 2 different influxdb sources that are hosted on 2 different servers to chronograf kapacitor but I cannot get it working.
Can you connect to 2 different influxdb instances through the UI?
How do you configure kapacitor.conf to read from 2 different influxdb instances?
Through the Chronograf UI I can get either source working correctly but not both at the same time. This seems to be expected through the UI so I must be missing something.
If I set the sources in kapacifor.conf, chronograf does not read from them. There are also no errors in kapacitor logs.
This is my kapacitor.conf influxdb settings that do not work:
[[influxdb]]
enabled = true
default = true
name = "localcluster"
urls = ["http://localhost:8086"]
username = ""
password = ""
timeout = 0
[[influxdb]]
enabled = true
default = false
name = "remoteCluster"
urls = ["http://remotehost:8086"]
username = ""
password = ""
timeout = 0
I have read the documentation and also have the latest TICK stack packages.
I have also searched online and found some references that look like my configuration and are said to work, but they do not seem to work for me.
TICK stack host information:
CentOS Linux release 7.6.1810 (Core)
telegraf-1.9.1-1.x86_64
influxdb-1.7.2-1.x86_64
chronograf-1.7.4-1.x86_64
kapacitor-1.5.1-1.x86_64
Any help would be greatly appreciated.
I got it working but I am not sure if the configuration is recommended:
Add a new InfluxDB connection through the Chronograf web UI.
Do not create another Kapacitor Connection as only one can be active at a time.
In the graph Queries tab, select the new
InfluxDB connection from the drop down list.
Metrics from the alternate InfluxDB instance will appear and can be queried.

OpenTSDB Plotting internal stats

I'm fairly new to OpenTSDB but I managed to set it up inside a docker container and connect Grafana to it.
Now I'm looking for a way to keep track of it's health. In particular, I would like to plot some of the metrics that come from the internal stats (e.g. tsd.rpc.received).
When I try to use them as a regular metric in the Graph panel of OpenTSDB I get a "java.lang.RuntimeException: Unexpected exception".
I know I could connect the http api (/api/stats) to another tool to then send the metrics to cloudwatch or a similar app. But I was hoping for something that didn't involved adding more pieces to the solution.
In the documentation I found: "The Telnet style API also supports the "stats" command for fetching over CLI. These can easily be published right back into OpenTSDB at any interval you like."
Is this the recommended way to keep track of those internal metrics? Read from the stats api and then feed them back to OpenTSDB?
After looking different alternatives I found that the best way to get the internal stats is either using the tcollector util or injecting the output of the stats command back into opentsdb and using grafana to visualize the data.
Since in my particular case I don't want to install another component like tcollector, I'll feed the stats back to opentsdb as metrics in every node.
This is a small script I wrote to feed the stats back via the telnet api.
#!/bin/bash
while true; do
sleep 5
STATSINPUT=$(echo "stats" | nc 0 4242 -w1)
while IFS= read -r line
do
echo "Feed: $line"
INPUT="put $line"
echo $INPUT | nc 0 4242 -w0
done < <(printf '%s\n' "$STATSINPUT")
done

MySQL connection pool in python?

I'm trying to process large amount of data using Python and maintaining processing status in MySQL. However, I'm surprised there is no standard connection pool for python-mysql (like HikariCP in Java).
I initially started with PyMySQL, things were great until the program ran for first few hours. After few hours, things started to fail. I was getting lot of errors like:
pymysql.err.OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1' ([Errno 99] Cannot assign requested address)")
Moreover, lot of ports were stuck in TIME_WAIT state because I'm opening and closing connections too frequently because of lack of connection pooling
/d/p/950 ❯❯❯ netstat -nt | wc -l
84752
Per this and this, I tried to set tcp_fin_timeout and ip_local_port_range, but hardly anything improved.
echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout
echo 15000 65000 > /proc/sys/net/ipv4/ip_local_port_range
Then I found out that MySQL provides mysql.connector which comes with pooling functionality. After doing all that performance actually deteriorated. More processes started to get failed. I'm using Python's multiprocessing module to simultaneously run 29 processes(multiprocessing.Pool picked this no by default) on a 24 core machine. Following was the code, of course I was using .my.cnf to pass all the credential to avoid committing them to git :
import mysql.connector
from mysql.connector import pooling
conn_pool = pooling.MySQLConnectionPool(pool_name="mypool1",
pool_size=pooling.CNX_POOL_MAXSIZE,
option_files=MYSQL_CONFIG,
option_groups=MYSQL_GROUP_NODE1,
allow_local_infile=True)
conn = conn_pool.get_connection()
Finally, reverted back to old code. Still using PyMySQL and though errors are less frequent it is still causing a major problem. I looked at SQLAlchemy and couldn't really found much of a documentation around pooling.
I'm wondering how's everyone else dealing with mysql-python connection pooling issue? I really believe there should be something out there so that I don't have to reinvent the wheel.
Any pointers are much appreciated.
DBUtils implements MySQL (and generally claims to support abritrary DB-API 2 compliant database interfaces) user-sized connection pool PooledDB, thead-mapped pool PersistentDB and SteadyDB (see functionality section). The latter should fit your case where multiprocessing.Pool creates worker processes with managed persistent database connection each. It is described as:
DBUtils.SteadyDB is a module implementing "hardened" connections to a database, based on ordinary connections made by any DB-API 2 database module. A "hardened" connection will transparently reopen upon access when it has been closed or the database connection has been lost or when it is used more often than an optional usage limit.
You can use it with PyMySQL like:
import pymysql
from DBUtils.SteadyDB import connect
db = connect(
creator = pymysql, # the rest keyword arguments belong to pymysql
user = 'guest', password = '', database = 'name',
autocommit = True, charset = 'utf8mb4',
cursorclass = pymysql.cursors.DictCursor)
Also see this related question for more examples.

How to monitor elasticsearch using nagios

I would like to monitor elasticsearch using nagios.
Basiclly, I want to know if elasticsearch is up.
I think I can use the elasticsearch Cluster Health API (see here)
and use the 'status' that I get back (green, yellow or red), but I still don't know how to use nagios for that matter ( nagios is on one server and elasticsearc is on another server ).
Is there another way to do that?
EDIT :
I just found that - check_http_json. I think I'll try it.
After a while - I've managed to monitor elasticsearch using the nrpe.
I wanted to use the elasticsearch Cluster Health API - but I couldn't use it from another machine - due to security issues...
So, in the monitoring server I created a new service - which the check_command is check_command check_nrpe!check_elastic. And now in the remote server, where the elasticsearch is, I've editted the nrpe.cfg file with the following:
command[check_elastic]=/usr/local/nagios/libexec/check_http -H localhost -u /_cluster/health -p 9200 -w 2 -c 3 -s green
Which is allowed, since this command is run from the remote server - so no security issues here...
It works!!!
I'll still try this check_http_json command that I posted in my qeustion - but for now, my solution is good enough.
After playing around with the suggestions in this post, I wrote a simple check_elasticsearch script. It returns the status as OK, WARNING, and CRITICAL corresponding to the "status" parameter in the cluster health response ("green", "yellow", and "red" respectively).
It also grabs all the other parameters from the health page and dumps them out in the standard Nagios format.
Enjoy!
Shameless plug: https://github.com/jersten/check-es
You can use it with ZenOSS/Nagios to monitor cluster health, data indices, and individual node heap usage.
You can use this cool Python script for monitoring your Elasticsearch cluster. This script check your IP:port for Elasticsearch status. This one and more Python script for monitoring Elasticsearch can be found here.
#!/usr/bin/python
from nagioscheck import NagiosCheck, UsageError
from nagioscheck import PerformanceMetric, Status
import urllib2
import optparse
try:
import json
except ImportError:
import simplejson as json
class ESClusterHealthCheck(NagiosCheck):
def __init__(self):
NagiosCheck.__init__(self)
self.add_option('H', 'host', 'host', 'The cluster to check')
self.add_option('P', 'port', 'port', 'The ES port - defaults to 9200')
def check(self, opts, args):
host = opts.host
port = int(opts.port or '9200')
try:
response = urllib2.urlopen(r'http://%s:%d/_cluster/health'
% (host, port))
except urllib2.HTTPError, e:
raise Status('unknown', ("API failure", None,
"API failure:\n\n%s" % str(e)))
except urllib2.URLError, e:
raise Status('critical', (e.reason))
response_body = response.read()
try:
es_cluster_health = json.loads(response_body)
except ValueError:
raise Status('unknown', ("API returned nonsense",))
cluster_status = es_cluster_health['status'].lower()
if cluster_status == 'red':
raise Status("CRITICAL", "Cluster status is currently reporting as "
"Red")
elif cluster_status == 'yellow':
raise Status("WARNING", "Cluster status is currently reporting as "
"Yellow")
else:
raise Status("OK",
"Cluster status is currently reporting as Green")
if __name__ == "__main__":
ESClusterHealthCheck().run()
I wrote this a million years ago, and it might still be useful: https://github.com/radu-gheorghe/check-es
But it really depends on what you want to monitor. The above measures:
if Elasticsearch responds to HTTP
if ingestion rate drops under the defined levels
if total number of documents drops the defined levels
But of course there's much more that might be interesting. From query time to JVM heap usage. We wrote a blog post about the most important ones here: https://sematext.com/blog/top-10-elasticsearch-metrics-to-watch/
Elasticsearch has APIs for all these, so you may be able to use a generic check_http_json to get the needed metrics. Alternatively, you may want to use something like Sematext Monitoring for Elasticsearch, which gets these metrics out of the box, then forward threshold/anomaly alerts to Nagios. (disclosure: I work for Sematext)

Resources