Querying for tag values in a given list - influxdb

Is there any shortform syntax in influxdb to query for membership in a list? I'm thinking of something along the lines of
SELECT * FROM some_measurement WHERE some_tag IN ('a', 'b', 'c')
For now I can string this together using ORed =s, but that seems very inefficient. Any better approaches? I looked through the language spec and I don't see this as a possibility in the expression productions.
Another option I was thinking was using the regex approach, but that seems like a worse approach to me.

InfluxDB 0.9 supports regex for tag matching. It's the correct approach although of course regex can be problematic. It's not a performance issue for InfluxDB, and in fact would likely be faster than multiple chained OR statements. There is no support yet for clauses like IN or HAVING.
For example: SELECT * FROM some_measurement WHERE some_tag =~ /a|b|c/

Related

Understanding 'strictness' in regex grammar

In writing a grammar/parser for a regex, I'm wondering why the following constructions are both syntactically and semantically valid in the regex syntax (at least as far as I can understand it)?
Repetitions of a character class, such as:
Repetitions of a zero-width assertion, such as:
Assertions at a bogus position, such as:
To me, this sort of seems like having a syntax that would allow having a construction like SELECT SELECT SELECT SELECT SELECT col FROM tbl. In other words, why isn't the regex syntax defined as more strict than it is in practice?
To start with, that's not a very good analogy. Your statement with multiple SELECT keywords is not, as far as I know, part of the SQL grammar, so it's simply ungrammatical. Repeated elements in a character class are more like the SQL construct:
SELECT * FROM table WHERE Value IN (1, 2, 3, 2, 3, 2, 3)
I think most (if not all) SQL processors would allow that. You could argue that it would be nice if a warning message were issued, but the usual SQL interface (where a query is sent from client to server and a result returned) does not leave space for a warning.
It's certainly the case that repeated characters in a character class are often an indication that the regular expression was written by a novice with only a fuzzy idea of what a character class is. If you hang out in SO's flex-lexer long enough, you'll see how often students write regular expressions like [a-z|A-Z|0-9], or even [begin|end]. If flex detected duplicate characters in character classes, those mistakes would receive warnings, which might or might not be useful to the student coder. (Reading and understanding warning messages is not, apparently, an innate skill.) But it needs to be asked, Who is the target audience for a tool like Flex, and I think the answer is not "impatient beginners who won't read documentation". Certainly, a non-novice programmer might also make that kind of mistake, usually as a result of a typo, but it's not common and it will probably easily be detected during debugging.
If you've already started to work on a parser for regular expressions, you should already know why these rules are not usually a feature of regular expression libraries. Every rule you place on a syntax must be:
precisely defined,
documented,
implemented,
acted upon appropriately (which may mean complicating the interface in order to allow for warnings).
and all of that needs to be thoroughly tested.
That's a lot of work to prevent something which is not even technically an error. And catching those errors (if they are errors) will probably make the "grammar" for the regular expressions context-sensitive. (In other words, you can't write a context-free grammar which forbids duplication inside a character class.)
Moreover, practically all of those expressions might show up in the wild, typically in the case that the regular expression was programmatically generated. For example, suppose you have a set of words, and you want to write a regular expression which will match any sequence of characters not in any of the words. The simple solution is [^firstsecondthirdwordandtherest]+. Of course, you could have gone to the trouble of deduping the individual letters in the words, -- a simple task in some programming languages, but more complicated in others -- but should it be necessary?
With respect to your other two examples, with repeated and non-terminal $, there are actual regex libraries in which those interpreted in some different way. Many older regex libraries (including (f)lex) only treat $ as a zero-length end-of-string assertion if it appears at the very end of the regex; such libraries would treat all of those $s other than the last one in a$$$$$$$$ as matching a literal $. Others treat $ as an end-of-line assertion, rather than an end-of-string assertion. But I don't know of any which treat those as errors.
There's a pretty good argument that a didactic tool, such as regex101, might usefully issue warnings for doubtful regular expressions, a task usually called "linting". But for a production regex library, there is little to be gained, and a lot of pain.

Exclude measurements using regex in influxql

I have an influxql query used in grafana as follows:
SELECT mean("value") FROM /^concourse\.(worker*|web*).*\.system_mem_percent$/ WHERE $timeFilter GROUP BY time($__interval), * fill(null)
This works as expected, I get only the results of measurements with worker... or web... in it.
I'm trying to now build the inverse query of this. I.E all measurements that do not have worker... or web... in it.
I'll need to be able to do this with regex on measurements itself. (Assume no tags)
How could this be achieved in influxql (influxdb 1.7).
It looks like you need negative lookahead in your regexp. But InfluxDB uses Golang’s regular expression syntax, where negative lookahead is not supported.
You may use workaround with more complicated regexp with negated character classes - see Negative look-ahead in Go regular expressions
But I would go with better InfluxDB design and I would use tags. It is also recommended approach https://docs.influxdata.com/influxdb/v1.8/concepts/schema_and_data_layout/#avoid-encoding-data-in-measurement-names

MongoID Query with Regex and escaping

I want to know if it is necessary to escape regex in query calls with rails/mongoID ?
This is my current query:
#model.where(nice_id_string: /#{params[:nice_id_string]}/i)
I am now unsure if it is not secure enough, because of the regex.
Should i use this code below or does MongoID escape automatically query calls?
#model.where(nice_id_string: /#{Regexp.escape(params[:nice_id_string])}/i)
Of course you should escape the input. Consider params[:nice_id_string] being .*, your current query would be:
#model.where(nice_id_string: /.*/i)
whereas your second would be:
#model.where(nice_id_string: /\.\*/i)
Those do very different things, one of which you probably don't want. Someone with a sufficiently bad attitude could probably slip some catastrophic backtracking through your current version and I'm not sure what MongoDB/V8's regex engine will do with that.

What is cypher's Backus-Naur Form?

I'm wondering if Cypher (Neo4j query language) has a Backus-Naur Form.
If so, where can I find it? If it doesn't, could you guess one?
There isn't a separate grammar that's published for the language, but you can get what you need from this.
Internally, neo4j uses a package called Parboiled to do its parsing of cypher. In the cypher compiler software package, generally in /src/main/scala/org/neo4j/cypher/internal/compiler/v2_3/parser/ you'll find a file called Clauses.scala which essentially implements the cypher grammar in Scala.
To take a really simple example, here's the definition of the LIMIT clause:
private def Limit: Rule1[ast.Limit] = rule("LIMIT") {
group(keyword("LIMIT") ~~ (UnsignedIntegerLiteral | Parameter)) ~~>> (ast.Limit(_))
}
Simple enough, a LIMIT clause is the keyword LIMIT followed by an unsigned integer literal or parameter.
Note that one of the more complicated bits of the syntax is in Patterns.scala where you see what constitutes a graph pattern. Other resources like that are included by reference in Clauses.scala.
I don't have a lot of experience with parboiled, it's quite possible that given this definition of the grammer, parboiled could generate a grammar in whatever syntax you might like.

idiomatic way to do regular expression searches in rails models?

in my rails controller, i would like to do a regular expression search of my model. my googling seemed to indicate that i would have to write something like:
Model.find( :all, :condition => ["field REGEXP '?' " , regex_str] )
which is rather nasty as it implies MySQL syntax (i'm using Postgres).
is there a cleaner way of forcing rails (4 in my case) to do a regexp search on a field?
i also much prefer using using where() as it allows me to map my strong parameters (hash) directly to a query. so what i would like is something like:
Model.where( params, :match_by => { 'field': '~' } )
which would loosely translate to something like (if params['field'] = 'regex_str')
select * from models where field ~ regex_str
Unfortunately, there is no idiomatic way to do this. There's no built-in support for regular expressions in ActiveRecord. It'd be impossible to do efficiently unless each database adapter had a database-specific implementation, and not all databases support regular expression matches. Those that do don't all support the same syntax (for example, Postgres doesn't have the same regexp syntax as Ruby's Regexp class).
You'll have to roll your own using SQL, as you've noted in your question. There are alternatives, however.
For a Postgres-specific solution, check out pg_search, which uses Postgres's full text search capabilities. This is very fast and supports fuzzy searching and some pattern matching.
elasticsearch requires more setup, but is incredibly fast, with some nice gems to make your life easier. Here's a RailsCasts episode introducing it. It requires running a separate server, but it's not too hard to get started, and it's powerful. Still no regular expressions, but it's worth looking at.
If you're just doing a one-off regexp search against a single field, SQL is probably the way to go.

Resources