count peak values in a data window, kapacitor

count peak values in a data window, kapacitor - monitoring

I want to count the peak disc usage in a window of 5 mins.
I am new to tick script and kapacitor. this is the sample code. The thing is I only want to count in the active window (not the emitted 2 min window, even if it had some data points).
var curr = stream
|from()
.measurement('disk_usage_root_used_percentage')
|window()
.period(5m)
.every(2m)
.align()
// here i want the count to happen
|alert()
.crit(lambda: "count" >5 )
.log('/tmp/alerts.log')

Q:
How can I count the peak disc usage in a window of 5 minutes?
A:
What is going to happen when you specify period=5m and every=2m is that, Kapacitor will buffer up 5 minutes worth of point data and try to write it to its pipeline every 2 minutes.
So if the stream task were to go on for 10m, you'll find that your TICK script will be executed 5 times in total.
For each execution window, the dataset will consists of 3m of older data and 2m of the newer ones. Essentially they are overlapped, this is bad because your use case here is to only analyse the latest 5m point data and raise alarms if required, not looking back old data. In other words, you don't want to be spammed by false alarms.
To correct it, you will need to specify .period=5m and .every=5m for the window node. Doing so you'll find that the TICK gets ran twice upon 10 minutes up time, with each run consisting of the latest 5 minutes worth of data.
Let me know if this helps.

Related

How can I calculate the appropriate amount of channel capacity?

I am looking for a solution because the sth-channel is full.
I am troubled with calculating the appropriate capacity of channel capacity.
This document has the following description.
In order to calculate the appropriate capacity, just have in consideration the following parameters:
・The amount of events to be put into the channel by the sources per unit time (let's say 1 minute).
・The amount of events to be gotten from the channel by the sinks per unit time.
・An estimation of the amount of events that could not be processed per unit time, and thus to be reinjected into the channel (see next section).
How can I check the values of these parameters?

How can I check the values of these parameters?
You can't just check these parameters. They depend on your application.
What they are saying is that you should have a size which is large enough so the generator doesn't get stuck. This may not be possible in your application.
Say your generator receives one event per second and it takes 2 seconds for a receiver to manage that event. Now lets assume you have 3 receivers. In 1 second, you can manage to process 0.5 events per receiver. You have 3 receivers, so your receivers, together, are capable of processing 0.5 × 3 = 1.5 events, which is more than what you get as input. Your capacity can be 1 or 2, using 2 will greatly increase your chances that you do not get blocked.
Let's review another example:
Your generator wants to pushes 1,000 events per second
Your receivers take 3 seconds to process one event
You would need 1,000 x 3 = 3,000 receivers (3,000 goroutines that can run at full speed in parallel...)
In this example, the total number of receivers is so large that you have to either break up your code to work on multiple computers or optimize your receiver code so it can process the data in an amount of time that makes sense. Say you have 50 processors, your receivers will get 1,000 events per second, all 50 can run at full speed, you need one receiver to do its work in:
50 / 1000 = 0.05 seconds
Now let's assume that in most cases your goroutines take 0.02 but once in a while one will take 1 second. That means your goroutines can get a little behind. In that case your capacity (so the generator doesn't get blocked) should be a little over 1,000. Again, it will depend on how many of the routines get slowed down, etc. In this last example, a run is 0.02 seconds so to process 1,000 events it usually takes 0.02 seconds. If you can send those 1,000 event over the 1 second period, you may not even need the 50 goroutines and could have a smaller capacity. On the other hand, if you have big bursts where you may end up sending many (say 500) events all at ones, then more goroutines and a larger capacity is important to not get blocked.

What is the meaning of OneMinuteRate in JMX?

I am trying to calculate the Read/second and Write/Second in my Cassandra 2.1 cluster. After searching and reading, I came to know about JMX bean
org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency
Here I can see oneMinuteRate. I have started a brand new cluster and started collected these metrics from 0.
When I started my first record, I can see
Count = 1
OneMinuteRate = 0.01599111...
Does it mean that my write/s is 0.0159911?
Or does it mean that based on 1 minute data, my write latency is 0.01599 where Write Latency refers to the response time for writing a record?
Please help me understand the value.
Thanks.

It means that in the last minute, your writes per second were occuring at a rate of .01599 writes per second. Think about it this way: the rate of writes in the last 60 seconds would be
WritesInLastMinute ÷ 60
So in your case
1 ÷ 60 = 0.0166
Or more precisely, .01599.
If you observed no further writes after that, the value would descend down to zero over the next minute.

OneMinuteRate, FiveMinuteRate, and FifteenMinuteRate are exponential moving averages, meaning they are not simply dividing readings against time, instead, as the name implies they take an exponential series of averages as below:
result(t) = (1 - w) * result(t - 1) + (w) * event_this_period
where w is the weighting factor, t is the ticking time, in other words, simply they take 20% or the new reading and 80% of old readings, it's the same way UNIX systems measure CPU loads.
however, if this applies to requests that the server receives, below is a chart from one request to a server, measures taken by dropwizard.
as you can see, from one request a curve is drawn by time, it's really useful to determine trends, but not sure if they are great to monitor live traffic and especially critical one.

Over 8 minutes to read a 1 GB side input view (List)

We have a 1 GB List that was created using View.asList() method on beam sdk 2.0. We are trying to iterate through every member of the list and do, for now, nothing significant with it (we just sum up a value). Just reading this 1 GB list is taking about 8 minutes to do so (and that was after we set the workerCacheMb=8000, which we think means that the worker cache is 8 GB). (If we don't set the workerCacheMB to 8000, it takes over 50 minutes before we just kill the job.). We're using a n1-standard-32 instance, which should have more than enough RAM. There is ONLY a single thread reading this 8GB list. We know this because we create a dummy PCollection of one integer and we use it to then read this 8GB ViewList side-input.
It should not take 6 minutes to read in a 1 GB list, especially if there's enough RAM. EVEN if the list were materialized to disk (which it shouldn't be), a normal single NON-ssd disk can read data at 100 MB/s, so it should take ~10 seconds to read in this absolute worst case scenario....
What are we doing wrong? Did we discover a dataflow bug? Or maybe the workerCachMB is really in KB instead of MB? We're tearing our hair out here....

Try to use setWorkervacheMb(1000). 1000 MB = Around 1GB. It will pick the side input from cache of each worker node and that will be fast.
DataflowWorkerHarnessOptions options = PipelineOptionsFactory.create().cloneAs(DataflowWorkerHarnessOptions.class);
options.setWorkerCacheMb(1000);
Is it really required to iterate 1 GB of side input data every time or you are need some specific data to get during iteration?
In case you need specific data then you should get it by passing specific index in the list. Getting data specific to index is much faster operation then iterating whole 1GB data.

After checking with the Dataflow team, the rate of 1GB in 8 minutes sounds about right.
Side inputs in Dataflow are always serialized. This is because to have a side input, a view of a PCollection must be generated. Dataflow does this by serializing it to a special indexed file.
If you give more information about your use case, we can help you think of ways of doing it in a faster manner.

NEsper memory usage of "output" keyword

I have many EPL statements that output a period of time (1~24 hours), and following is my statement
"SELECT MessageID, VName, count(VName) as count FROM DDIEvent(MajorType=4).std:groupwin(VName).win:time(3 hour).win:length(10) group by VName having count(VName) >= 10 output last every 3 hour"
If there is no limit of the length window, my case will retain around 700K records in 3 hours.
And in above, my test case only have 100 different VName. For my understanding, there will have maximum 1000 records keep in memory at the same time, (100 * 10[length]) am i right?
But the memory usage of my application will keep growing until output to listener.
The memory usage almost the same as the sample without length window.
And after output to listener the memory significantly fall down.
Then, another cycle begin, memory grow slowly until 3 hour later.
I already check the document, do not find the memory related topic of the "output" keyword.
Does anyone knows what is the really root cause? And how to avoid? Or just my EPL's problem?
Thank you~

If your query removes the "MessageId" from the select clause, it becomes a regular aggregation query. You could do a "last(MessageId)" instead. Because "MessageId" is in there the rows that the engine delivers are a row for each event rather then a row for each aggregation group.

Esper (C.E.P.) query to calculate candlesticks every full minute

I am using Complex Event Processing (Esper) technology to provide a real-time candlestick calculations in my system. I am doing fine with calculating values, however I find it difficult to ensure that candle window starts at full minutes (for one minute candle) and ends before the next minute starts (i.e. candle 1[06:00.000 - 06:00.999], candle 2[06:01.000 - 06:01.999], etc... ).
Is there a pattern or command in Esper's query language that is able to provide such functionality?
I'd appreciate constructive comments and directions.

In Esper you can use a pattern to fire every minute at the zero second, i.e.
insert into TriggerEvent select * from pattern[pattern[every timer:interval(1 min).]
// named window to hold candle data, compute next candle
on TriggerEvent select * from NamedWindowCandle ....
// delete old data
on TriggerEvent delete from NamedWindowCandle
-rg

Local time is often different from exchange time, also there is latency in delivering tick data. Minute bars are often computed using exchange timestamp. The exchange timestamp value must be extracted from tick events. New minute bar event is sent when the tick timestamps enter new minute.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart