Merging multiple metrics for Dataflow alert configuration - google-cloud-dataflow

I have an alert in GCP which checks that total_streaming_data_processed produces values for all active dataflow jobs within some period. The query for the alert is defined as:
fetch dataflow_job
| metric 'dataflow.googleapis.com/job/total_streaming_data_processed'
| filter
(resource.job_name =~ '.*dataflow.*')
| group_by 30m,
[value_total_streaming_data_processed_mean:
mean(value.total_streaming_data_processed)]
| every 30m
| absent_for 1800s
This alert seems to fire even for dataflow jobs which have been recently drained. I suppose the alert is working as intended but we would like to tune this alert to only check fire for jobs in a running state. I believe the metric to use here is dataflow.googleapis.com/job/status but I'm having trouble merging these two metrics in the same alert. What's the best way to have an alert check against two different metrics and only fire when both conditions are
Tried to add the second metric dataflow.googleapis.com/job/status but the mql editor returns "Line 5: Table operation 'metric' expects 'Resource' input, but input is 'Table'." when I try to pass a second metric

Related

How to send non aggregated metric to Influx from Springboot application?

I have a SpringBoot application that is under moderate load. I want to collect metric data for a few of the operations of my app. I am majorly interested in Counters and Timers.
I want to count the number of times a method was invoked (# of invocation over a window, for example, #invocation over last 1 day, 1 week, or 1 month)
If the method produces any unexpected result increase failure count and publish a few tags with that metric
I want to time a couple of expensive methods, i.e. I want to see how much time did that method took, and also I want to publish a few tags with metrics to get more context
I have tried StatsD-SignalFx and Micrometer-InfluxDB, but both these solutions have some issues I could not solve
StatsD aggregates the data over flush window and due to aggregation metric tags get messed up. For example, if I send 10 events in a flush window with different tag values, and the StatsD agent aggregates those events and publishes only one event with counter = 10, then I am not sure what tag values it's sending with aggregated data
Micrometer-InfluxDB setup has its own problems, one of them being micrometer sending 0 values for counters if no new metric is produced and in that fake ( 0 value counter) it uses same tag values from last valid (non zero counter)
I am not sure how, but Micrometer also does some sort of aggregation at the client-side in MeterRegistry I believe, because I was getting a few counters with a value of 0.5 in InfluxDB
Next, I am planning to explore Micrometer/StatsD + Telegraf + Influx + Grafana to see if it suits my use case.
Questions:
How to avoid metric aggregation till it reaches the data store (InfluxDB). I can do the required aggregation in Grafana
Is there any standard solution to the problem that I am trying to solve?
Any other suggestion or direction for my use case?

Is there a way to generate an error event if a signal has no events in the last 24 hours for example?

I know there's health check to check specific URLs, but I was wondering if there was a simpler way to setup a signal and have seq generate an error if that signal has no entries in the last 24 hours, so that seq not only can notify us of errors via Digest Email app for instance, but also notify us if something like a job failed to run altogether, which would obviously generate no error.
Seq's dashboard alerts can do this. They're based on charts which are configured with a few different parameters.
On the chart's Signal tab, choose the signal.
On the chart's Query tab:
select count(*) as count
from stream
And from the Alerts tab, an an alert with condition:
count = 0
over the time range you want to check.

AWS EventBridge scheduled events with custom details?

I'm trying to build an architecture where a single Lambda is triggered on a schedule with multiple parameter sets.
So for example if I have three sets of parameters and set schedule to ten minutes I expect to get three executions every ten minutes.
Is there a way to trigger an EventBridge scheduled events with custom properties so I can pass parameters to Lambda? I've noticed the details property in the event schema but couldn't find any reference to its usage with scheduled events.
To trigger a single lambda function for multiple parameter sets, you can create a separated schedule rule for each parameter set.
To provide input to your triggered lambda function you can set "configure input" when you select your lambda function as a target, for example you can provide your input in json format.

InfluxDB: query to calculate average of StatsD "executionTime" values

I'm sending metrics in StatsD format to Telegraf, which forwards them to InfluxDB 0.9.
I'm measuring execution times (of some event) from multiple hosts. The measurement is called "execTime", and the tag is "host". Once Telegraf gets these numbers, it calculates mean/upper/lower/count, and stores them in separate measurements.
Sample data looks like this in influxdb:
TIME...FIELD..............HOST..........VALUE
t1.....execTime.count.....VM1...........3
t1.....execTime.mean......VM1...........15
t1.....execTime.count.....VM2...........6
t1.....execTime.mean......VM2...........22
(So at time t1, there were 3 events on VM1, with mean execution time 15ms, and on VM2 there were 6 events, and the mean execution time was 22ms)
Now I want to calculate the mean of the operation execution time across both hosts at time t1. Which is (3*15 + 6*22)/(3+6) ms.
But since the count and mean values are in two different series, I can't simply use "select mean(value) from execTime.mean"
Do I need to change my schema, or can I do this with the current setup?
What I need is essentially a new series, which is a combination of the execTime.count and execTime.mean across all hosts. Instead of calculating this on-the-fly, the best approach seems to be to actually create the series along with the others.
So now I have two timer stats being generated on each host for each event:
1. one event with actual hostname for the 'host' tag
2. second event with one tag "host=all"
I can use the first set of series to check mean execution times per host. And the second series gives me the mean time for all hosts combined.
It is possible to do mathematical operations on fields from two different series, provided both series are members of the same measurement. I suspect your schema is non-optimized for your use case.

How to automatically measure\monitor the average of sums of a consecutive samplers in jmeter?

I have the following JMeter test plan.
+Test Plan
+Login Thread Group
HttpRequest1
HttpRequest2
HttpRequest3
Is there a way to automatically view\monitor the average of sums of HttpRequest1 ,2 and 3?
I couln't found a way to do it in "Summary Report" or "Aggregate Report"
Is it possible? or do I have to do it manually?
Do you explicitly mean 'the average of sums' As in the average of the total sum for each request over the duration of the test run? If so, then I'm not aware of any JMeter listeners will show you the sum of elapsed time for a sampler, it's not something typically required. Instead, you could probably get what you need fairly easily from reading the jtl file at the command line.
But perhaps you meant something else, you might find that using a Transaction Controller serves you requirements. This will record and show the total elapsed time for multiple requests. So in your example, you might wrap HTTPRequest1, 2 & 3 in a transaction controller and this would give you the sum of all three requests. Then, the Aggregate and Summary listeners would show you the average for this transaction as a separate line.

Resources