TDengine monitoring functions - monitoring

Does TDengine provide functions of monitoring it own system? For example, users could use a way to check on cluster health status, file system usage, I/O usage etc?

You can find a 'log' database from TDengine to find such information or use grafana plugin to monitor them.

You want to use Grafana to monitoring the TDengine database with TDinsight.

Related

What is recommended solution for monitoring heterogeneous infrastructure?

I am looking for monitoring tool for the following use cases:
Collect basic metrics about virtual machine (cpu usage, memory usage, i/o, available space)
Extract metrics from SQL Server (probably running some queries)
Extract information from external service about processing i.e how many processing are currently running and for how long. I am thinking about writing python scripts, but don't know how to combine with monitoring tool
Have the ability to plot charts and manage alerts and it will nice to have ability to send not only mails, but send message to slack/ms teams.
I was thing about Prometheus, because it has wmi_exporter, node_exporter, sql exporter, alert manager with possibility to send notifications to multiple destinations, but I don't know what to do with this external service and python scripts.
Any suggestions?
Prometheus can definitely do what you say you need done. Some of it may not be trivial, but you can definitely fill in the blanks yourself.
E.g. you can get machine metrics basically out of the box by firing up a node_exporter and having it scraped by Prometheus, but I don't think it has e.g. information on all running processes. The latter might require you to write an agent/exporter: a simple web server that exposes metrics on /metrics; there exists a Python client library to help with that. Or have said processes (assuming they're your code) push metrics to a Pushgateway instead, if they're short lived batch jobs.
Oh, and for charts/dashboards you probably want Grafana, as Prometheus' abilities in that area are rather limited and Grafana integrates rather well with Prometheus.

Monitoring agent

I have a requirement where in I need to monitor some custom services on aws nodes and collect metrics in timeseries. There are specifically two use cases. One being the monitoring of hardware resources like cpu, mem, disk util etc and the other being monitoring service specific metrics.
While reading up I came across collectd as one of the open source option. However I wanted to know how I can use collectd to monitor service specific metrics. Does collectd expose APIs which the service can use to log the metrics and if yes how performant is it.
I am new to collectd & would like to know if there are any other open source options as well.
The collectd agent can monitor custom metrics using read plugins:
Exec plugin for custom metrics fetched with bash scripts
cURL-JSON for metrics published in JSON format via HTTP

Zabbix integration with prometheus

We are currently monitoring our network devices with Zabbix but now we want to use Zabbix along with Prometheus for real-time monitoring and powerful alerting of Prometheus.
How can I integrate my existing Zabbix solution with Prometheus?
There seems to be a Zabbix to Prometheus exporter that may achieve what you want, but please note that I wouldn't recommend that.
Apart from some temporary migration scenarios I see little use in polling one monitoring system from the other. You're probably better off deploying the appropriate Prometheus exporters (e.g. SNMP, if your talking about network devices) and use Prometheus for the whole monitoring setup.
Of course you can still keep your Zabbix setup running side by side, if you need to.

Zabbix & external monitoring systems

I need to make freinds zabbix & other monitoring system.
My company uses Zabbix for monitoring. Our partner plans to use other system.
We need to exchange monitoring datas.
I'm interested in coopereation with the next systems: BMC Patrol, MS SCOM, NetCool, Portal.
What is the best way to integrate it?
Maybe via SNMP?
Replicate hosts and metrics into your Zabbix (use Zabbix trapper item type and setup also Allowed hosts value) and then just use some suitable zabbix-sender implementation and push data into Zabbix.
IMO it's terrible idea, because latency, syncing, ... Do you really need data (item values) or do you need only visualize data from different datasources in one graph?
Regarding BMC Patrol you can use History Loader/Propagator KM to export the monitoring data:
https://docs.bmc.com/docs/display/public/unixlinux912/PATROL+KM+for+History+Loader
or you can use the 'dump_hist' command to dump the history data from the agents:
https://docs.bmc.com/docs/display/pia9600/dump_hist+uility
Regarding Netcool events, you could get the information using different approaches, for example, depending on the version, you could get the events from the HTTP interface, as described below:
https://www.ibm.com/support/knowledgecenter/en/SSNFET_9.2.0/com.ibm.netcool_OMNIbus.doc_7.4.0/omnibus/wip/api/reference/omn_api_http_httpinterface.html
Or perhaps you could create a flat file gateway to read the events and write them on a file:
https://www.ibm.com/support/knowledgecenter/en/SSSHTQ/omnibus/gateways/flatfilegw/wip/concept/flatfilegw_intro.html

How can we collect performance metrics from CAdvisor docker container?

Sorry I just started to learn docker. My question may seem stupid for some of you.
In fact, I would like to know if there is a way to collect performance metrics from "CAdvisor" container (not from cgroup) at runtime ? I mean, extract performance values from the curves designed by cadvisor like memory usage or network traffic.
I need to record this values and save them in a database so that, I can perform a statistic analyzes upon these generated values (like comparing memory consumption for two docker containers at t=50s).
Thanks in advance.
As other answers mention, cAdvisor doesn't provide its own performance data API, instead it exposes metrics which are typically handled in a separate database if one wants to derive performance data beyond "real time". For example, cAdvisor exports Prometheus metrics natively:
http://prometheus.io/docs/instrumenting/exporters/
The Prometheus metric types:
http://prometheus.io/docs/concepts/metric_types/
Prometheus supports a fairly rich functional expression language that can be used for querying and visualization:
http://prometheus.io/docs/querying/basics/
cAdvisor does provide a rest endpoint to get any stats in real time. By default, it keeps latest two minute of data. You can configure it to keep more or less. It also supports a storage backend to keep dumping stats to an influxdb database.
REST Api:
eg. /api/v1.3/containers
doc: https://github.com/google/cadvisor/blob/master/docs/api.md
Doc on setting up InfluxDB:
https://github.com/google/cadvisor/blob/master/docs/influxdb.md
I think you could use https://github.com/tutumcloud/container-metrics for this. Basically what that would be doing is using influxdb http://influxdb.com/ as a time series data store.
There is some more information available here: http://blog.tutum.co/2014/08/25/panamax-docker-application-template-with-cadvisor-elasticsearch-grafana-and-influxdb/
A couple of people seemed to be looking into the ELK stack (Elastic Search, Logstash, Kibana) for visualising some of this data here: https://github.com/google/cadvisor/issues/634

Resources