Monitoring agent - monitoring

I have a requirement where in I need to monitor some custom services on aws nodes and collect metrics in timeseries. There are specifically two use cases. One being the monitoring of hardware resources like cpu, mem, disk util etc and the other being monitoring service specific metrics.
While reading up I came across collectd as one of the open source option. However I wanted to know how I can use collectd to monitor service specific metrics. Does collectd expose APIs which the service can use to log the metrics and if yes how performant is it.
I am new to collectd & would like to know if there are any other open source options as well.

The collectd agent can monitor custom metrics using read plugins:
Exec plugin for custom metrics fetched with bash scripts
cURL-JSON for metrics published in JSON format via HTTP

Related

How to collect messages (total number and size) between microservices?

I have a microservices based software architecture.
There is a php application which orchestrates the communication among microservices and the application's whole logic.
I need to simulate the communication between microservices as a graph.
There will be edges with weights , which will represent the affinities between microservices.
I am searching for a tool in order to collect all messages and their size.
I have read that there are distibuted tracing systems like Zipkin which i have already deployed, and could accomplish this task.
But, i cannot find how to collect the messages i want.
This is the php library i used for the instrumentation of my app
[https://github.com/openzipkin/zipkin-php]
Any ideas about other tools or how to use Zipkin differently to achieve my goal?
Let me add to this thread my three bits. Speaking of Envoy, yes, when attached to your application it adds a lot of useful features from observability bucket, e.g. network level statistics and tracing.
Here is the question, have you considered running your legacy apps inside service mesh, like Istio ?.
Istio simplifies deployment and configuration of Envoy for you. It injects sidecar container (istio-proxy, in fact Envoy instance) to your Pod application, and gives you these extra features like a set of service metrics out of the box*.
Example: Stats produced by Envoy in Prometheus format, like istio_request_bytes are visualized in Kiali Metrics dashboard for inbound traffic as request_size (check screenshot)
*as mentioned by #David Kruk, you still needs to have Prometheus server deployed in your cluster to be able to pull these metrics to Kiali dashboards.
You can learn more about Istio here. There is also a dedicated section on how to visualize metrics collected by Istio (e.g. request size).

What is recommended solution for monitoring heterogeneous infrastructure?

I am looking for monitoring tool for the following use cases:
Collect basic metrics about virtual machine (cpu usage, memory usage, i/o, available space)
Extract metrics from SQL Server (probably running some queries)
Extract information from external service about processing i.e how many processing are currently running and for how long. I am thinking about writing python scripts, but don't know how to combine with monitoring tool
Have the ability to plot charts and manage alerts and it will nice to have ability to send not only mails, but send message to slack/ms teams.
I was thing about Prometheus, because it has wmi_exporter, node_exporter, sql exporter, alert manager with possibility to send notifications to multiple destinations, but I don't know what to do with this external service and python scripts.
Any suggestions?
Prometheus can definitely do what you say you need done. Some of it may not be trivial, but you can definitely fill in the blanks yourself.
E.g. you can get machine metrics basically out of the box by firing up a node_exporter and having it scraped by Prometheus, but I don't think it has e.g. information on all running processes. The latter might require you to write an agent/exporter: a simple web server that exposes metrics on /metrics; there exists a Python client library to help with that. Or have said processes (assuming they're your code) push metrics to a Pushgateway instead, if they're short lived batch jobs.
Oh, and for charts/dashboards you probably want Grafana, as Prometheus' abilities in that area are rather limited and Grafana integrates rather well with Prometheus.

How to send server metrics data to statsd?

Our monitoring stack is Grafana + InluxDB + statsD.
We use it for application monitoring.
We need to add server metrics (CPU, memory, network connections, etc...) to Grafana, so I'm guessing we'll need some agent to collect server metrics and pass to statsD.
Do you know of any agent that can do that? or any other way to implement this?
Probably the simplest option for you would be to switch over to using collectd https://collectd.org/, and replace statsd with the statsd plugin for collectd https://collectd.org/wiki/index.php/Plugin:StatsD
Check https://my-netdata.io.
It can monitor a ton of things, it is a statsd server by itself, it can visualize all metrics by itself and can push all metrics to graphite, opentsdb, prometheus, influxdb, etc.
Free and open source: GPL v3+.
EDIT: it also allows you to send statsd metrics from shell scripts: https://github.com/firehol/netdata/wiki/statsd#sending-statsd-metrics-from-shell-scripts

Zabbix & external monitoring systems

I need to make freinds zabbix & other monitoring system.
My company uses Zabbix for monitoring. Our partner plans to use other system.
We need to exchange monitoring datas.
I'm interested in coopereation with the next systems: BMC Patrol, MS SCOM, NetCool, Portal.
What is the best way to integrate it?
Maybe via SNMP?
Replicate hosts and metrics into your Zabbix (use Zabbix trapper item type and setup also Allowed hosts value) and then just use some suitable zabbix-sender implementation and push data into Zabbix.
IMO it's terrible idea, because latency, syncing, ... Do you really need data (item values) or do you need only visualize data from different datasources in one graph?
Regarding BMC Patrol you can use History Loader/Propagator KM to export the monitoring data:
https://docs.bmc.com/docs/display/public/unixlinux912/PATROL+KM+for+History+Loader
or you can use the 'dump_hist' command to dump the history data from the agents:
https://docs.bmc.com/docs/display/pia9600/dump_hist+uility
Regarding Netcool events, you could get the information using different approaches, for example, depending on the version, you could get the events from the HTTP interface, as described below:
https://www.ibm.com/support/knowledgecenter/en/SSNFET_9.2.0/com.ibm.netcool_OMNIbus.doc_7.4.0/omnibus/wip/api/reference/omn_api_http_httpinterface.html
Or perhaps you could create a flat file gateway to read the events and write them on a file:
https://www.ibm.com/support/knowledgecenter/en/SSSHTQ/omnibus/gateways/flatfilegw/wip/concept/flatfilegw_intro.html

How to use ganglia ui with flume?

I am interested in monitoring my multi-agent apache flume setup. I have enabled the inbuilt ganglia server which provides me the flume metrics through JSON data. Now I am interested in viewing these info in graphs/charts. TO achieve this I am using ganglia web ui, I have these questions - Do I have to install gmond and gmetad to achieve it, if not then how I will use the existing ganglia info with the ganglia web ui ?
Thanks in advance.
You'll need both, IMHO. Moreover, I think Flume can communicate directly to a gmond by appending some stuff in JAVA_OPTS, see hortonworks docs.
You'll need gmetad because it stores your data in RRD files, and the web UI query on it to display graphs.
Graphite can do the job too.

Resources