Plot event values in Graphite - monitoring

We would like to use Graphite to plot values related to events such as "a packet of N messages has been published". When no packet is published, no code is run at all and so we cannot send zero to Graphite.
Essentially, we would like to compute some kind of publication rate per second.
Here are some sample data that we send to Graphite (with added timestamps):
2016-11-28 14:46:33.6338Z api.message.publication.count:100
2016-11-28 15:01:36.0780Z api.message.publication.count:12
2016-11-28 15:01:36.9911Z api.message.publication.count:1
2016-11-28 15:01:37.0679Z api.message.publication.count:100
Between 14:46:33 and 15:01:36, no messages were sent. However, between 15:01:36 and 15:01:37, 13 messages were sent (reported as two values, 12 and 1).
I've tried the summarize() function but it does not give results that make sense to me, i.e. I cannot correlate what I'm sending to Graphite and what is displayed by Graphite. Moreover, it seems that summarize() does not support 1-second intervals (I've tried "1second" and "1s" for the interval parameter).
The perSecond() function computes a rate of change (i.e. a derivative) but what we're sending is already a kind of derivative (maybe it's closer to a Dirac delta?) so it doesn't make sense in our context.
Are we completely off, or is there a way to make this work with Graphite?
Edit: I guess we need to add an aggregation stage to our data. Would Carbon aggregation fit the bill here?

It turns out that we were already sending our metrics to statsd, which supports aggregation via the c metric type, and a few other nifty things: https://github.com/etsy/statsd/blob/master/docs/metric_types.md

Related

How long Prometheus timeseries last without and update

If I send a gauge to Prometheus then the payload has a timestamp and a value like:
metric_name {label="value"} 2.0 16239938546837
If I query it on Prometheus I can see a continous line. Without sending a payload for the same metric the line stops. Sending the same metric after some minutes I get another continous line, but it is not connected with the old line.
Is this fixed in Prometheus how long a timeseries last without getting an update?
I think the first answer by Marc is in a different context.
Any timeseries in prometheus goes stale in 5m by default if the collection stops - https://www.robustperception.io/staleness-and-promql. In other words, the line stops on graph (or grafana).
So if you resume the metrics collection again within 5 minutes, then it will connect the line by default. But if there is no collection for more than 5 minutes then it will show a disconnect on the graph. You can tweak that on Grafana to ignore drops but that not ideal in some cases as you do want to see when the collection stopped instead of giving the false impression that there was continuous collection. Alternatively, you can avoid the disconnect using some functions like avg_over_time(metric_name[10m]) as needed.
There is two questions here :
1. How long does prometheus keeps the data ?
This depends on the configuration you have for your storage. By default, on local storage, prometheus have a retention of 15days. You can find out more in the documentation. You can also change this value with this option : --storage.tsdb.retention.time
2. When will I have a "hole" in my graph ?
The line you see on a graph is made by joining each point from each scrape. Those scrape are done regularly based on the scrape_interval value you have in your scrape_config. So basically, if you have no data during one scrape, then you'll have a hole.
So there is no definitive answer, this depends essentially on your scrape_interval.
Note that if you're using a function that evaluate metrics for a certain amount of time, then missing one scrape will not alter your graph. For example, using a rate[5m] will not alter your graph if you scrape every 1m (as you'll have 4 other samples to do the rate).

Graphite has null values between data points

I have an API that fetches data packets from different servers. It formats this data to different small JSON units. I wrote an algorithm that sends them to graphite with the command json2graphite.
The sending works very well, the incoming data doesn't look bad either.
Now the problem:
The data displayed in graphite shows that each entry is followed by a null.
The data points that should be connected
I am aware that this data can also be connected using a function provided by the Graphite interface, but this doesn't help because Grafana boards always jump back and forth between value and null.
Is there a way to tell Grafana that it only goes to null if there was no data for more than 1 min or so?
I already tried to fix the problem with the data from "storage-schemas.conf" and "storage-aggregation.conf". Unfortunately without success.
storage-schemas.conf:
[default_1min_for_1day]
pattern = .*
retentions = 10s:6h,30s:8d,1m:31d,10m:1y,1h:5y
aggregation.conf:
[default_average]
pattern = .*
xFilesFactor = 0
aggregationMethod = average
If you want to know any more, ask me. : )
Grafana has an option to connect datapoints that are separated by nulls. You can see how to enable this in the screenshot shown under Display Styles settings on Grafana's documentation.
In Graphite composer you can also do it by specifying the connected line mode under Graph options here:
Additionally, you could use Graphite's keepLastValue function to carry the last received value over gaps where there are nulls.
I haven't found a direct solution but I will now try to minimize the interval between the entries. I noticed that the requests take much too long: 2-5 minutes.
There are probably too many servers, so the requests block the port too long.
The problem is not solved yet but I think I will mark it as solved if nobody says I have the problem within 5 days.

Control rate of individual topic consumption in Kafka Streams 0.9.1.0-cp1?

I am trying to backprocess data in Kafka topics using a Kafka Streams application that involves a join. One of the streams to be joined has much larger volume of data per unit of time in the corresponding topic. I would like to control the consumption from the individual topics so that I get roughly the same event timestamps from each topic in a single consumer.poll(). However, there doesn't appear to be any way to control the behavior of the KafkaConsumer backing the source stream. Is there any way around this? Any insight would be appreciated.
Currently Kafka cannot control the rate limit on both Producers and Consumers.
Refer:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-13+-+Quotas
But if you are using Apache Spark as the stream processing platform, you can limit the input rate for the Kafka receivers.
in the consumer side you can use consume([num_messages=1][, timeout=-1])
function instead of poll.
consume([num_messages=1][, timeout=-1]):
Consumes a list of messages (possibly empty on timeout). Callbacks may be executed as a side effect of calling this method.
The application must check the returned Message object’s Message.error() method to distinguish between proper messages (error() returns None) and errors for each Message in the list (see error().code() for specifics). If the enable.partition.eof configuration property is set to True, partition EOF events will also be exposed as Messages with error().code() set to _PARTITION_EOF.
num_messages (int) – The maximum number of messages to return (default: 1).
timeout (float) – The maximum time to block waiting for message, event or callback (default: infinite (-1)). (Seconds)

how to cluster percentile of events by time delta?

After a mailing at t0, I will have several "delivered" (and open and click) events (schema and example)
mailing_name, timestamp, email_id, event_type
niceattack, 2016-07-14 12:11:00, 42, open
niceattack, 2016-07-14 12:11:08, 842, open
niceattack, 2016-07-14 12:11:34, 847, open
I would like to see for a mailing how long it takes to be delivered to half of the recipients. So say that I'm sending an email to 1000 addresses now, the first open event is in 2 min, the last one is going to be in a week (and min/max first last seems to be easy to find) but what I'd like to see is that half of the recipients opened it in the first 2 hours after it was sent.
The goal is to send being able to compare is sending now vs on sat morning makes a difference on how fast it's open on average, or if one specific mailing get quicker exposure, and correlate that with other events (how many click on a link, take a specific action on our site...)
I tried to use a cumulate function (how many open event for mailing for each point), but it seems that the cumulative function isn't yet implemented https://github.com/influxdata/influxdb/issues/813
How do you solve that problem with influxdb?
Solving this problem with InfluxDB alone is not currently possible, however if you're willing to add Kapacitor into the mix, then it should be possible. In particular you'll need to write a User Defined Function (UDF) for that cumulative function in Kapacitor.
The general process will look like the following:
Install and Configure Kapacitor
Create a UDF for the cumulative function you're looking for
Enable that UDF inside of Kapacitor
Write a TICKscript that uses the UDF and writes the results back to InfluxDB
Enable a task defined by the TICKscript you've written
Query the InfluxDB instance to get the results of the cumulative function.
My appoligies for being so high level on this. This is a fairly involved process, but should give you the result you're looking for.

Questions about the nextTuple method in the Spout of Storm stream processing

I am developing some data analysis algorithms on top of Storm and have some questions about the internal design of Storm. I want to simulate a sensor data yielding and processing in Storm, and therefore I use Spout to push sensor data into the succeeding bolts at a constant time interval via setting a sleep method in nextTuple method of Spout. But from the experiment results, it appeared that spout didn't push data at the specified rate. In the experiment, there was no bottleneck bolt in the system.
Then I checked some material about the ack and nextTuple methods of Storm. Now my doubt is if the nextTuple method is called only when the previous tuples are fully processed and acked in the ack method?
If this is true, does it means that I cannot set a fixed time interval to emit data?
Thx a lot!
My experience has been that you should not expect Storm to make any real-time guarantees, including in your case the rate of tuple processing. You can certainly write a spout that only emits tuples on some time schedule, but Storm can't really guarantee that it will always call on the spout as often as you would like.
Note that nextTuple should be called whenever there is room available for more pending tuples in the topology. If the topology has free capacity, I would expect Storm to try to fill it up if it can with whatever it can get.
I had a similar use-case, and the way I accomplished it is by using TICK_TUPLE
Config tickConfig = new Config();
tickConfig.put(Config.TOPOLOGY_TICK_TUPLE_FREQ_SECS, 15);
...
...
builder.setBolt("storage_bolt", new S3Bolt(), 4).fieldsGrouping("shuffle_bolt", new Fields("hash")).addConfigurations(tickConfig);
Then in my storage_bolt (note it's written in python, but you will get an idea) i check if message is tick_tuple if it is then execute my code:
def process(self, tup):
if tup.stream == '__tick':
# Your logic that need to be executed every 15 seconds,
# or what ever you specified in tickConfig.
# NOTE: the maximum time is 600 s.
storm.ack(tup)
return

Resources