OpenTelemetry JVM and System metrics

OpenTelemetry JVM and System metrics - monitoring

I have used micrometer.io for most of my career to collect metrics. One of the coolest micrometer features is binding to collect information about the host system and jvm: https://micrometer.io/docs/ref/jvm on the basis of which it was possible to run the Grafana dashboard without much effort: https://grafana.com/grafana/dashboards/4701
Currently, I am starting to learn about OpenTelemetry, but I cannot find a description of the above functionalities. I do not want to use instrumentation, I want to depend on a manual definition of what is to be measured. Can you show me a way to do this? How to easily manually provide system/JVM metrics?

I don't think such a component exist in OTel, see:
Metrics API spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md
Metrics SDK spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md

See Is there an equivalent of Prometheus simpleclient_hotspot with Opentelemetry?.
You can use opentelemetry-java-instrumentation and manually register JVM metrics observers.

Related

How to collect messages (total number and size) between microservices?

I have a microservices based software architecture.
There is a php application which orchestrates the communication among microservices and the application's whole logic.
I need to simulate the communication between microservices as a graph.
There will be edges with weights , which will represent the affinities between microservices.
I am searching for a tool in order to collect all messages and their size.
I have read that there are distibuted tracing systems like Zipkin which i have already deployed, and could accomplish this task.
But, i cannot find how to collect the messages i want.
This is the php library i used for the instrumentation of my app
[https://github.com/openzipkin/zipkin-php]
Any ideas about other tools or how to use Zipkin differently to achieve my goal?

Let me add to this thread my three bits. Speaking of Envoy, yes, when attached to your application it adds a lot of useful features from observability bucket, e.g. network level statistics and tracing.
Here is the question, have you considered running your legacy apps inside service mesh, like Istio ?.
Istio simplifies deployment and configuration of Envoy for you. It injects sidecar container (istio-proxy, in fact Envoy instance) to your Pod application, and gives you these extra features like a set of service metrics out of the box*.
Example: Stats produced by Envoy in Prometheus format, like istio_request_bytes are visualized in Kiali Metrics dashboard for inbound traffic as request_size (check screenshot)
*as mentioned by #David Kruk, you still needs to have Prometheus server deployed in your cluster to be able to pull these metrics to Kiali dashboards.
You can learn more about Istio here. There is also a dedicated section on how to visualize metrics collected by Istio (e.g. request size).

What is recommended solution for monitoring heterogeneous infrastructure?

I am looking for monitoring tool for the following use cases:
Collect basic metrics about virtual machine (cpu usage, memory usage, i/o, available space)
Extract metrics from SQL Server (probably running some queries)
Extract information from external service about processing i.e how many processing are currently running and for how long. I am thinking about writing python scripts, but don't know how to combine with monitoring tool
Have the ability to plot charts and manage alerts and it will nice to have ability to send not only mails, but send message to slack/ms teams.
I was thing about Prometheus, because it has wmi_exporter, node_exporter, sql exporter, alert manager with possibility to send notifications to multiple destinations, but I don't know what to do with this external service and python scripts.
Any suggestions?

Prometheus can definitely do what you say you need done. Some of it may not be trivial, but you can definitely fill in the blanks yourself.
E.g. you can get machine metrics basically out of the box by firing up a node_exporter and having it scraped by Prometheus, but I don't think it has e.g. information on all running processes. The latter might require you to write an agent/exporter: a simple web server that exposes metrics on /metrics; there exists a Python client library to help with that. Or have said processes (assuming they're your code) push metrics to a Pushgateway instead, if they're short lived batch jobs.
Oh, and for charts/dashboards you probably want Grafana, as Prometheus' abilities in that area are rather limited and Grafana integrates rather well with Prometheus.

Monitoring agent

I have a requirement where in I need to monitor some custom services on aws nodes and collect metrics in timeseries. There are specifically two use cases. One being the monitoring of hardware resources like cpu, mem, disk util etc and the other being monitoring service specific metrics.
While reading up I came across collectd as one of the open source option. However I wanted to know how I can use collectd to monitor service specific metrics. Does collectd expose APIs which the service can use to log the metrics and if yes how performant is it.
I am new to collectd & would like to know if there are any other open source options as well.

The collectd agent can monitor custom metrics using read plugins:
Exec plugin for custom metrics fetched with bash scripts
cURL-JSON for metrics published in JSON format via HTTP

bosun and telegraf metrics meta information

hello i really want to use bosun/tsdbrelay/opentsdb with the telegraf collector, as it gets all the metrics we want to monitor out of the box.
i allready have a small setup to push metrics from 5 servers to bosun for indexing and opentsdb for storage.
i used the haproxy configs from kyle brandts bosun infrastructure blog to make the tsdbs ha-ready
but bosun is showing that it cannot use the auto-type for metrics, and also in the primary stats view does not show any graphs for cpu / mem etc.
what can i provide that the graphs show up.
kind regards.

Both of these features are mostly scollector specific. The "host" view (I've considered ripping that out, it was done in the early days, better to use something like grafana) depends on scollector specific metrics such as os.cpu.
As far as "Auto" for rate vs gauge, that is also metadata that comes from scollector and sent to bosun. If you want to try to mimic the behavior see https://github.com/bosun-monitor/bosun/blob/master/metadata/metadata.go#L30 and https://github.com/bosun-monitor/bosun/blob/master/metadata/metadata.go#L195 - you would need to create at least the "rate" key for each metric you are getting from telegraph.

How can we collect performance metrics from CAdvisor docker container?

Sorry I just started to learn docker. My question may seem stupid for some of you.
In fact, I would like to know if there is a way to collect performance metrics from "CAdvisor" container (not from cgroup) at runtime ? I mean, extract performance values from the curves designed by cadvisor like memory usage or network traffic.
I need to record this values and save them in a database so that, I can perform a statistic analyzes upon these generated values (like comparing memory consumption for two docker containers at t=50s).
Thanks in advance.

As other answers mention, cAdvisor doesn't provide its own performance data API, instead it exposes metrics which are typically handled in a separate database if one wants to derive performance data beyond "real time". For example, cAdvisor exports Prometheus metrics natively:
http://prometheus.io/docs/instrumenting/exporters/
The Prometheus metric types:
http://prometheus.io/docs/concepts/metric_types/
Prometheus supports a fairly rich functional expression language that can be used for querying and visualization:
http://prometheus.io/docs/querying/basics/

cAdvisor does provide a rest endpoint to get any stats in real time. By default, it keeps latest two minute of data. You can configure it to keep more or less. It also supports a storage backend to keep dumping stats to an influxdb database.
REST Api:
eg. /api/v1.3/containers
doc: https://github.com/google/cadvisor/blob/master/docs/api.md
Doc on setting up InfluxDB:
https://github.com/google/cadvisor/blob/master/docs/influxdb.md

I think you could use https://github.com/tutumcloud/container-metrics for this. Basically what that would be doing is using influxdb http://influxdb.com/ as a time series data store.
There is some more information available here: http://blog.tutum.co/2014/08/25/panamax-docker-application-template-with-cadvisor-elasticsearch-grafana-and-influxdb/
A couple of people seemed to be looking into the ELK stack (Elastic Search, Logstash, Kibana) for visualising some of this data here: https://github.com/google/cadvisor/issues/634

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

OpenTelemetry JVM and System metrics - monitoring

I don't think such a component exist in OTel, see: Metrics API spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/api.md Metrics SDK spec: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/metrics/sdk.md

See Is there an equivalent of Prometheus simpleclient_hotspot with Opentelemetry?. You can use opentelemetry-java-instrumentation and manually register JVM metrics observers.

Related

How to collect messages (total number and size) between microservices?

What is recommended solution for monitoring heterogeneous infrastructure?

Monitoring agent

bosun and telegraf metrics meta information

How can we collect performance metrics from CAdvisor docker container?

Categories

Resources