Exclude measurements using regex in influxql - influxdb

I have an influxql query used in grafana as follows:
SELECT mean("value") FROM /^concourse\.(worker*|web*).*\.system_mem_percent$/ WHERE $timeFilter GROUP BY time($__interval), * fill(null)
This works as expected, I get only the results of measurements with worker... or web... in it.
I'm trying to now build the inverse query of this. I.E all measurements that do not have worker... or web... in it.
I'll need to be able to do this with regex on measurements itself. (Assume no tags)
How could this be achieved in influxql (influxdb 1.7).

It looks like you need negative lookahead in your regexp. But InfluxDB uses Golang’s regular expression syntax, where negative lookahead is not supported.
You may use workaround with more complicated regexp with negated character classes - see Negative look-ahead in Go regular expressions
But I would go with better InfluxDB design and I would use tags. It is also recommended approach https://docs.influxdata.com/influxdb/v1.8/concepts/schema_and_data_layout/#avoid-encoding-data-in-measurement-names

Related

Swift 3: most performant way to check many strings with many regular expressions

I do have a list of several hundred strings and an array of 10k regular expressions.
I now have to iterate over all strings and check which of the 10k regular expressions match. What's the most performant way to do this?
Currently I'm doing this:
myRegularExpression.firstMatch(in: myString, options: myMatchingOption, range: NSMakeRange(0, myString.characters.count)) == nil
where myRegularExpression is an NSRegularExpression stored for reuse and myMatchingOption is NSRegularExpression.MatchingOptions(rawValue: 0)
Is there a faster, more performant way to check if a string matches one of those 10k regular expressions?
EDIT:
I need to know not only IF one of my 10k regular expressions fit but also which one. So currently I do have a for loop inside a for-loop: the outer one iterates over my several hundred strings and for each of these strings I iterate over my 10k rules and see if one rule fits (of course if one fits I can stop for that string, so roughly:
for string in stringsToCheck {
for rule in myRules {
if string.matches(rule) {
// continue with next string of stringsToCheck
}
}
}
Depending on what platform you're running this on, separating the work in using multiple threads may provide some response time improvements but I believe that really dramatic optimization for this would require some insight on the nature of the regular expressions.
For example, if the expressions don't have a specific precedence order, you could rearrange (reorder) them to make the most likely "matches" come first in the list. This could be evaluated pre-emptively either by the supplier of the expressions or using some function to estimate their complexity (e.g. length of the expression, presence of optional or combinatory symbols).
Or it could be evaluated statistically by collecting (and persisting) hit/miss counts for each expression. But of course such an optimization assumes that every string will match at least one expression and that the 80/20 rule applies (i.e 20% of the expressions match 80% of the strings).
If the expressions are very simple and only make use of letter patterns, then you would get better performance with more "manual" implementations of the matching functions (instead of regex). In the best case scenario, simple letter patterns can be converted into a character tree and yield orders of magnitude in performance improvements.
Note that these solutions are not mutually exclusive. For example, if a large proportion of the expressions are simple patterns and only a few have complex patterns then you don't have to throw away the baby with the bath water: you can apply the simple pattern optimization to a subset of the rules and use the "brute force" nested loop to the remaining complex ones.
I had a similar problem in the past where thousands of rules would need to be applied on hundreds of thousands of records for processing insurance claims. The traditional "expert system" approach was to create a list of rules and run every record through it. Obviously that would take a ridiculous amount of time (like 2 months execution time to process one month of claims). Looking at it with a less than "purist" mindset, I was able to convince my customer that rules should be defined hierarchically. So we split them into a set of eligibility rules and a set of decision rules. Then we further refined the structure by creating eligibility groups and decision groups. What we ended up with was a coarse tree structure where rules would allow the system to narrow down the number of rules that should be applied to a given record. With this, the 6 week processing time for 250,000 records was cut down to 7 hours (this was in 1988 mind you).
All this to say that taking a step back into the nature of the problem to solve may provide some optimization opportunities that are not visible when looking merely at the mechanics of one process option.

'Or' operator for bosun tags in an expression

I am writing a Bosun expression in order to get the number of 2xx responses in a service like:
ungroup(avg(q("sum:metric.name.hrsp_2xx{region=eu-west-1}", "1m", "")))
The above expression gives me the number of 2xx requests of the selected region (eu-west-1) in the last minute but I would like to get the number of 2xx requests that happened in 2 regions (eu-west-1 and eu-central-1).
This metric is tagged with region. I have 4 regions available.
I was wondering if it is possible to do an 'or' operation with the tags. Something like:
{region=or(eu-west-1,eu-central-1)}
I've checked the documentation but I'm not able to find anything to achieve this.
Since q() is specific to querying OpenTSDB, it uses the same syntax. The basic syntax for what you put would be to use a pipe symbol: ungroup(avg(q("sum:metric.name.hrsp_2xx{region=eu-west-1|eu-central-one}", "1m", ""))).
If you have version 2.2 set to true you can also use more advanced features of the filters as documented in the OpenTSDB documentation (i.e. host=literal_or(web01|web02|web03)). The main advantage is that OpenTSDB added the ability to aggregate a subset of tag values instead of all or nothing. The Graph page in Bosun also helps you generate the queries for OpenTSDB.

Querying for tag values in a given list

Is there any shortform syntax in influxdb to query for membership in a list? I'm thinking of something along the lines of
SELECT * FROM some_measurement WHERE some_tag IN ('a', 'b', 'c')
For now I can string this together using ORed =s, but that seems very inefficient. Any better approaches? I looked through the language spec and I don't see this as a possibility in the expression productions.
Another option I was thinking was using the regex approach, but that seems like a worse approach to me.
InfluxDB 0.9 supports regex for tag matching. It's the correct approach although of course regex can be problematic. It's not a performance issue for InfluxDB, and in fact would likely be faster than multiple chained OR statements. There is no support yet for clauses like IN or HAVING.
For example: SELECT * FROM some_measurement WHERE some_tag =~ /a|b|c/

MongoID Query with Regex and escaping

I want to know if it is necessary to escape regex in query calls with rails/mongoID ?
This is my current query:
#model.where(nice_id_string: /#{params[:nice_id_string]}/i)
I am now unsure if it is not secure enough, because of the regex.
Should i use this code below or does MongoID escape automatically query calls?
#model.where(nice_id_string: /#{Regexp.escape(params[:nice_id_string])}/i)
Of course you should escape the input. Consider params[:nice_id_string] being .*, your current query would be:
#model.where(nice_id_string: /.*/i)
whereas your second would be:
#model.where(nice_id_string: /\.\*/i)
Those do very different things, one of which you probably don't want. Someone with a sufficiently bad attitude could probably slip some catastrophic backtracking through your current version and I'm not sure what MongoDB/V8's regex engine will do with that.

What is cypher's Backus-Naur Form?

I'm wondering if Cypher (Neo4j query language) has a Backus-Naur Form.
If so, where can I find it? If it doesn't, could you guess one?
There isn't a separate grammar that's published for the language, but you can get what you need from this.
Internally, neo4j uses a package called Parboiled to do its parsing of cypher. In the cypher compiler software package, generally in /src/main/scala/org/neo4j/cypher/internal/compiler/v2_3/parser/ you'll find a file called Clauses.scala which essentially implements the cypher grammar in Scala.
To take a really simple example, here's the definition of the LIMIT clause:
private def Limit: Rule1[ast.Limit] = rule("LIMIT") {
group(keyword("LIMIT") ~~ (UnsignedIntegerLiteral | Parameter)) ~~>> (ast.Limit(_))
}
Simple enough, a LIMIT clause is the keyword LIMIT followed by an unsigned integer literal or parameter.
Note that one of the more complicated bits of the syntax is in Patterns.scala where you see what constitutes a graph pattern. Other resources like that are included by reference in Clauses.scala.
I don't have a lot of experience with parboiled, it's quite possible that given this definition of the grammer, parboiled could generate a grammar in whatever syntax you might like.

Resources