Stacking/Overlapping metrics by name - jmx

I have an application shipping metrics to prometheus over micrometer-jmx and I cannot change the application to use micrometer-prometheus instead. All parameterized metrics are therefore not prometheus labels but are instead encoded directly into the name of the metric.
i.e. instead of requests_Count{processor="BILLING_PROCESSOR", type="SCRIPT"} metrics are in the form of requests_PRC_BILLING_PROCESSOR_TYP_SCRIPT_Count.
Now let's say I want a graph in grafana of request counts grouped (stacked/overlapped) by type. Is there any way I can accomplish that without labels and with metrics in that format? I've managed to construct grafana variables which extract the processor and type values from the metric name but I can't seem to do much with those values.

You could configure Prometheus to convert the metrics names. This is part of relabel-ing available in Prometheus. It is described in the Prometheus Configuration and in a blog post by one of the core contributors.
As extracted from the blog post a metrics can be converted from
memory_pools_PS_Eden_Space_committed
to
memory_pools_committed_bytes{pool="PS_Eden_Space"}
by applying a configuration as follows:
scrape_configs:
job_name: my_job
# Usual fields go here to specify targets.
metric_relabel_configs:
- source_labels: [__name__]
regex: '(memory_pools)_(.*)_(\w+)'
replacement: '${2}'
target_label: pool
- source_labels: [__name__]
regex: '(memory_pools)_(.*)_(\w+)'
replacement: '${1}_${3}_bytes'
target_label: __name__

Related

How can I get envoyproxy/ratelimit statistics for descriptors without value?

I am using envoyproxy/ratelimit (along with Istio) to setup a global rate limiting in my k8s cluster for a given service. The rate limit is based on a header (in my case the username) so that each username is limited by the number of RPS. The following configuration was used to achieve this:
domain: ratelimit
descriptors:
- key: USERNAME
rate_limit:
unit: second
requests_per_unit: 100
shadow_mode: true
Also, I used a EnvoyFilter (Istio CRD) to define which header will used.
The resulting metric does not show a label for a specific user, just for the entire descriptor:
ratelimit_service_rate_limit_within_limit{app="ratelimit",domain="ratelimit",instance="xxx",job="kubernetes-pods",key1="USERNAME",kubernetes_namespace="xxx",kubernetes_pod_name="ratelimit-xxx",pod_template_hash="xxx",security_istio_io_tlsMode="istio",service_istio_io_canonical_name="ratelimit",service_istio_io_canonical_revision="latest"}
So my question is: how can I get the metrics for a specific username? Considering my configuration is applied to all of them and not for a specific value.
Thanks to this PR you can now add a detailed_metric parameter to enable this behavior, as shown in this example.

How to use FluentD as a buffer between Telegraf and InfluxDB

Is there any way to ship metrics gathered form Telegraf to FluentD, then into InfluxDB?
I know it's possible to write data from FluentD into InfluxDB; but how does one ship data from Telegraf into FluentD, basically using use FluentD as a buffer (as opposed to using Kafka or Redis)?
While it might be possible to do with FluentD using some of the available, although outdated output plugins, such as InfluxDB-Metrics, I couldn't get the plugin to work properly and it hasn't been updated in over six years, so it will probably not work with newer releases of FluentD.
Fluent Bit however, has an Influxdb output built right into it, so I was able to get it to work with that. The caveat is that it has no Telegraf plugin. So the solution I found was to setup a tcp input plugin in Fluent Bit, and set Telegraf to write JSON formatted data to it in it's output section.
The caveat of doing this, is that the JSON data is nested and not formatted properly for InfluxDB. The workaround is to use nest filters in Fluent Bit to 'lift' the nested data format, and re-format properly for InfluxDB.
Below is an example for disk-space, which is not a metric that is natively supported with Fluent Bit metrics but is natively supported with Telegraf:
#SET me=${HOST_HOSTNAME}
[INPUT] ## tcp recipe ## Collect data from telegraf
Name tcp
Listen 0.0.0.0
Port 5170
Tag telegraf.${me}
Chunk_Size 32
Buffer_Size 64
Format json
[FILTER] ## rename the three tags sent from Telegraf to prevent duplicates
Name modify
Match telegraf.*
Condition Key_Value_Equals name disk
Rename fields fieldsDisk
Rename name nameDisk
Rename tags tagsDisk
[FILTER] ## un-nest nested JSON formatted info under 'field' tag
Name nest
Match telegraf.*
Operation lift
Nested_under fieldsDisk
Add_prefix disk.
[FILTER] ## un-nest nested JSON formatted info under 'disk' tag
Name nest
Match telegraf.*
Operation lift
Nested_under tagsDisk
Add_prefix disk.
[OUTPUT] ## output properly formatted JSON info
Name influxdb
Match telegraf.*
Host influxdb.server.com
Port 8086
#HTTP_User whatever
#HTTP_Passwd whatever
Database telegraf.${me}
Sequence_Tag point_in_time
Auto_Tags On
NOTE: This is just a simple awkward config for my own proof of concept

Can Telegraf combine/add value of metrics that are per-node, say for a cluster?

Let's say I have some software running on a VM that is emitting two metrics that are fed through Telegraf to be written into InfluxDB. Let's say the metric are no. successfully handled HTTP requests (S), and no. of failed HTTP requests (F), on that VM. However, I might configure three such VMs each emitting those 2 metrics.
Now, if I would like to have a computed metric which is the sum of S from each VM, and sum of F from each VM, and store as new metrics, at various instants of time. Is this something that can be achieved using Telegraf ? Or is there a better, more efficient, more elegant way ?
Kindly note that my knowledge of Telegraf and InfluxDB are theoretical, as I've recently started reading up about them, so I have not actually tried any of the above, yet.
This isn't something telegraf would be responsible for.
With Influx 1.x, you'd use a TICKScript or Continuous Queries to calculate the sum and inject the new sampled value.
Roughly, this would look like:
CREATE CONTINUOUS QUERY "sum_sample_daily" ON "database"
BEGIN
SELECT sum("*") INTO "daily_measurement" FROM "measurement" GROUP BY time(1d)
END
CQ docs

Google Dataflow custom metrics not showing on Stackdriver

I'm trying to get a deeper view on my dataflow jobs by measuring parts of it using Metrics.counter & Metrics.gauge but I cannot find them on Stackdriver.
I have a premium Stackdriver account and I can see those counters under the Custom Counters section on the Dataflow UI.
I can see droppedDueToLateness 'custom' counter though on Stackdriver that seems to be created via Metrics.counter as well...
Aside from that, there's something that could be helpful that is that when I navigate https://app.google.stackdriver.com/services/dataflow the message I get is this:
"You do not have any resources of this type being monitored by Stackdriver." and that's weird as well. As if our Cloud Dataflow wasn't properly connected to Stackdriver, but, on the other hand. Some metrics are displayed and can be monitored such as System Lag, Watermark age, Elapsed time, Element count, etc...
What am I missing?
Regards
Custom metric naming conventions
When defining custom metrics in Dataflow, you have to adhere to the custom metric naming conventions, or they won't show up in Stackdriver.
Relevant snippet:
You must adhere to the following spelling rules for metric label
names:
You can use upper and lower-case letters, digits, underscores (_) in
the names.
You can start names with a letter or digit.
The maximum length of a metric label name is 100 characters.
If you create a metric with
Metrics.counter('namespace', 'name')
The metric shows up in stackdriver as custom.googleapis.com/dataflow/name, so 'name' should adhere to the rules mentioned above. The namespace does not seem to be used by Stackdriver.
Additional: labels
It doesn't seem possible to add labels to the metrics when defined this way. However, the full description of each time series of a metric is a string with the format
'name' job_name job_id transform
So you can aggregate by these 4 properties (+ region and project).

How to transform "Tags Values" in Telegraf

How can I transform the Tag Values in Telegraf?
I am trying to import Web access logs into InfluxDB with Telegraf. However, some of the URL PATHs include identifiers (session IDs, product IDs, etc).
I need to search and aggregate per path type (ids excluded), therefore, I can't(?) have them vary like that.
In the input plugin "logparser" I can use a grok extraction pattern but I can't do transformations of the values extracted that I know of.
And the only processor plugin (in between Input and Output) is merely a "printer".
I can't find any clean way of doing this with Telegraf. Maybe I could do some gymmics with Telegraf (multiple Grok parsers + ex/inclusions?) but after some quite extensive attempts I didn't manage to make anything work - it appeared quite fiddly.
This is only half an answer but:
I managed to achieve what I was trying with LogStash instead, outputting to InfluxDB (LogStash has its own output plugin to InfluxDB). Not as desirable, since now I'm having to run both Telegraf + LogStash but it's working.
I've created a feature request on Telegraf's GitHub:
https://github.com/influxdata/telegraf/issues/2667

Resources