Count how many metrics matches a condition in graphite - monitoring

I have a list of classes that extracts info from the web. Every time each one of them saves something, it sends a different counter to graphite. So, every one of them is a different metric.
How do I know how many of them satisfy a certain condition??
For example, let:
movingAverage(summarize(groupByNode(counters.crawlers.*.saved, 2, "sumSeries), "1hour"), 24)
be the average of content download in past 24 hours. How can i know, at a moment "t", how many of my metrics have this value above 0?

In the rendering endpoint, add format=json. This would return the data-points with corresponding epochs in JSON, which is a breeze to parse. The time-stamps wherein your script sent nothing will be NULL.
[{
"target": "carbon.agents.ip-10-0-0-228-a.metricsReceived",
"datapoints":
[
[912, 1383888170],
[789, 1383888180],
[800, 1383888190],
[null, 1383888200],
[503, 1383888210],
[899, 1383888220]
]
}]

You can use the currentAbove function. Check it out. For example:
currentAbove(stats.route.*.servertime.*, 5)
The above example gets all of the metrics (in the series) that are above 5.
You could then count the number of targets returned, and while Graphite doesn't provide a way to count the "buckets" you should be able to capture it fairly easily.
For example, a way to get a quick count (using curl to pipe to grep and count on the word "target"). Like so:
> curl -silent http://test.net/render?target=currentAbove(stats.cronica.dragnet.messages.*, 5)&format=json \
> | grep -Po "target" | grep -c "target"

Related

How to return all values with matched key from Redis store

Working an learning some Redis and I am running into an issue that I can't figure out why it is not working.
It is my understanding from some posts and documentation I read that I can store data with similar keys like this
redis.set("fruit:1", "apple")
# OK
redis.set("fruit:2", "banana")
# OK
and it is my understanding I should be able to get all fruit like this
redis.get("fruit:*")
But I am missing something because this keep returning null and I cannot figure out what I need to do to return something like
=> apple, banana
I was able to figure out, using scan (suggested in answer below as well) how to return all matching fruit keys, but what I need is to search for all matching fruit keys, and then return the values for them.
#redis.scan_each(match: "fruit:*") do |fruit|
Rails.logger.debug('even run?')
fruits << fruit
end
=> fruit:1, fruit:2 # What I need though is apple, banana
What solved it (at least so far) is using mGet
#redis.scan_each(match: "fruit:*") do |fruit|
Rails.logger.debug('even run?')
fruits << #redis.mGet(fruit)
end
As far as I know, you can not do that (it would be cool to be possible).
Instead you have to run a scan SCAN 0 MATCH "fruit:*" COUNT 500 where 500 is number of keys to parse in the first iteration. So you have to run multiple iterations... until you reach the end of your key count.
If you want to get all keys with a single iteration, first you need to know how many keys there are in your DB, so you can run a command to find the total amount of keys:
dbsize which you can use instead of the 500 value in my example.
redis-cli dbsize
or
redis-cli INFO Keyspace | grep ^db0 | cut -f 1 -d "," | cut -f 2 -d"="
To return all values from all keys you can use MGET
https://redis.io/commands/mget
Maybe you can also take a deeper look at data structures
Maybe the solution above is not the correct one, in your case you can also use a set just to add your fruits in there.
https://redis.io/commands/sadd
SCAN can be a very expensive operation as it must traverse all the keys, even if you use a pattern. Because Redis is single-threaded, while that SCAN is happening, nothing else can happen and other requests are blocked. If you only have a few keys, this is fine. But if you have a lot of keys, this can cause serious performance issues.
What you need is an index that contains the keys of all of your fruits. You could use a Set to do this and whenever you add a fruit, you add the key to the Set:
> SET fruit:1 apple
> SADD fruit:index 1
Then, when you need all the fruits, you just get the members of the Set:
> SMEMBERS fruit:index
1) "1"
2) "2"
Then get the values of the fruits:
> GET fruit:1
> GET fruit:2

Filtering out based on count using Apache Beam

I am using Dataflow and Apache Beam to process a dataset and store the result in a headerless csv file with two columns, something like this:
A1,a
A2,a
A3,b
A4,a
A5,c
...
I want to filter out certain entries based on the following two conditions:
1- In the second column, if the number of occurrences of a certain value is less than N, then remove all such rows. For instance if N=10 and c only appears 7 times, then I want all those rows to be filtered out.
2- In the second column, if the number of occurrences of a certain value is more than M, then only keep M many of such rows and filter out the rest. For instance if M=1000 and a appears 1200 times, then I want 200 of such entries to be filtered out, and the other 1000 cases to be stored in the csv file.
In other words, I want to make sure all elements of the second columns appear more than N and less than M many times.
My question is whether this is possible by using some filter in Beam? Or should it be done as a post-process step once the csv file is created and saved?
You can use beam.Filter to filter out all the second column values that matches your range's lower bound condition into a PCollection.
Then correlate that PCollection (as a side input) with your original PCollection to filter out all the lines that need to be excluded.
As for the upperbound, since you want to keep any upperbound amount of elements instead of excluding them completely, you should do some post processing or come up with some combine transforms to do that.
An example with Python SDK using word count.
class ReadWordsFromText(beam.PTransform):
def __init__(self, file_pattern):
self._file_pattern = file_pattern
def expand(self, pcoll):
return (pcoll.pipeline
| beam.io.ReadFromText(self._file_pattern)
| beam.FlatMap(lambda line: re.findall(r'[\w\']+', line.strip(), re.UNICODE)))
p = beam.Pipeline()
words = (p
| 'read' >> ReadWordsFromText('gs://apache-beam-samples/shakespeare/kinglear.txt')
| "lower" >> beam.Map(lambda word: word.lower()))
import random
# Assume this is the data PCollection you want to do filter on.
data = words | beam.Map(lambda word: (word, random.randint(1, 101)))
counts = (words
| 'count' >> beam.combiners.Count.PerElement())
words_with_counts_bigger_than_100 = counts | beam.Filter(lambda count: count[1] > 100) | beam.Map(lambda count: count[0])
Now you get a pcollection like
def cross_join(left, rights):
for x in rights:
if left[0] == x:
yield (left, x)
data_with_word_counts_bigger_than_100 = data | beam.FlatMap(cross_join, rights=beam.pvalue.AsIter(words_with_counts_bigger_than_100))
Now you filtered out elements below lowerbound from the data set and get
Note the 66 from ('king', 66) is the fake random data I put in.
To debug with such visualizations, you can use interactive beam. You can setup your own notebook runtime following instructions; Or you can use hosted solutions provided by Google Dataflow Notebooks.

Cumulate word count by Flink timeWindow using Scala

I have the twitter streaming API and I am retrieving tweets from there.
I also have a list of desired words that I want to take into account.
What I want to do is to store to my Cassandra dataBase always the most accurate value corresponding to how many times the word was used on the day.
I was thinking of using window functions to consolidade the results each 5 seconds and then writing this consolidate value on the database.
I don't know if this is the best approach.
If this is the best approach, I tried to do a simple example following the documentation, but it doesn't group the words each 5 seconds.
val env = StreamExecutionEnvironment.getExecutionEnvironment
val counts =
env.fromElements("foo bar test test baz foo", "yes no no yes", "hi hello hi hello")
.flatMap { _.toLowerCase.split("\\W+") filter { _.nonEmpty } }
.filter(word => Words.listOfWords.contains(word) || Words.listOfWords2.contains(word))
.map { (_, 1) }
.keyBy(0)
.timeWindow(Time.seconds(5)).sum( 1)
counts.print()
env.execute("test-code")
}
Well, currently it will not work, because You are creating the DataStream from elements, which is not the best idea for windowing, because You won't really have a 5 seconds of the runtime to create more than one window, so all of the messages will go to the same window. But, if you would run this on the actual Twitter API, this should generally group the items into windows properly.

How to limit Jenkins API response to last n build IDs

http://xxx/api/xml?&tree=builds[number,description,result,id,actions[parameters[name,value]]]
Above API returns all the build IDs. Is there a way to limit results to get last 5 build IDS?
The tree query parameter allows you to explicitly specify and retrieve only the information you are looking for, by using an XPath-ish path expression. The value should be a list of property names to include, with sub-properties inside square braces. Try tree=jobs[name],views[name,jobs[name]] to see just a list of jobs (only giving the name) and views (giving the name and jobs they contain). Note: for array-type properties (such as jobs in this example), the name must be given in the original plural, not in the singular as the element would appear in XML (). This will be more natural for e.g. json?tree=jobs[name] anyway: the JSON writer does not do plural-to-singular mangling because arrays are represented explicitly.
For array-type properties, a range specifier is supported. For example, tree=jobs[name]{0,10} would retrieve the name of the first 10 jobs. The range specifier has the following variants:
{M,N}: From the M-th element (inclusive) to the N-th element (exclusive).
{M,}: From the M-th element (inclusive) to the end.
{,N}: From the first element (inclusive) to the N-th element (exclusive). The same as {0,N}.
{N}: Just retrieve the N-th element. The same as {N,N+1}.
Another way to retrieve more data is to use the depth=N query parameter . This retrieves all the data up to the specified depth. Compare depth=0 and depth=1 and see what the difference is for yourself. Also note that data created by a smaller depth value is always a subset of the data created by a bigger depth value.
Because of the size of the data, the depth parameter should really be only used to explore what data Jenkins can return. Once you identify the data you want to retrieve, you can then come up with the tree parameter to exactly specify the data you need.
I'm on version 1.509.4. which doesn't support range specifier.
Source: http://ci.citizensnpcs.co/api/
You can create an xml object with the build numbers via xpath and parse it yourself with via different means.
http://xxx/api/xml?xpath=//build/number&wrapper=meep
Creates an xml that looks like:
<meep>
<number>n</number>
<number>n+1</number>
...
<number>m</number>
</meep>
And will be populated with the build numbers n through m that are currently in jenkins for the specified job in the url. You can substitute anything for the word "meep", that will become the wrapper object for the newly created xml object.
How are you collecting/manipulating the api xml output once you get it? Because there is a solution here for How do I select the last N elements with XPath?. I tried using some of these xpath manipulations but I couldn't get it to work when playing with the url in my browser; it might work if you are doing something else.
When I get the xml object, I happen to manipulate it via shell scripts.
#!/bin/sh
# NOTE: To get the url to work with curl, you need a valid jenkins user and api token
# Put all build numbers in a variable called build_ids
build_ids="$(curl -sL --user ${_jenkins_api_user}:${_jenkins_api_token} \
"${_jenkins_url}/job/${_job_name}/api/xml?xpath=//build/number&wrapper=meep" \
| sed -e 's/<[^>]*>/ /g' | sed -e 's/ / /g')"
# Print the last 5 items with awk
echo "${build_ids}" | awk '{n = 5; for (--n; n >= 0; n--){ printf "%s\t",$(NF-n)} print ""}';
Once you have your xml object you can essentially parse it however you want.
NOTE: I am running Jenkins ver. 2.46.1
Looking at the doco at the raw .../api/ endpoint (on Jenkins 2.60.3) it says
For array-type properties, a range specifier is supported. For
example, tree=jobs[name]{0,10} would retrieve the name of the first 10
jobs. The range specifier has the following variants:
{M,N}: From the M-th element (inclusive) to the N-th element (exclusive).
{M,}: From the M-th element (inclusive) to the end.
{,N}: From the first element (inclusive) to the N-th element (exclusive). The same as {0,N}.
{N}: Just retrieve the N-th element. The same as {N,N+1}.
For the OP's case, you'd append {,5} to the end of the URL to get the first 5 results:
http://xxx/api/xml?&tree=builds[number,description,result,id,actions[parameters[name,value]]]{,5}

Splunk get combined result from 2 events

I am splunk noob trying to write a query for a couple of hours but not successful so far.
I want to count the number of times the command 'install' was triggered and the exit code was '0'
Each install command writes log in a new file with format 'install_timestamp' so I am searching for source="install*"
Using 2 source files as example
source1:
event1:command=install
... //a couple of other events
event100:exit_code=0
source2:
event1:command=install -f
... //a couple of other events
event100:exit_code=0
In this case I want the result to be 1. Only 1 occurrence of exit_code=0 when command was 'install' (not -f)
The thing that's confusing me is that the information for command and exit_code is in different events, I can get each of the two events separately but able to figure out how to get the combined result.
Any tips on how can I achieve the result I want ? - Thanks!
It's a little crude but you could do something like this...
("One String" NOT "Bad String") OR "Another String" | stats count by source | where count > 1
It will basically give you a list of files that contain events matching both strings. For your example this would be something like...
("command=install" NOT "-f") OR "exit_code=0" | stats count by source | where count > 1

Resources