The MQTT spec explicitly states that
“sport/+” does not match “sport” but it does match “sport/”.
“sport/#” also matches the singular “sport”, since # includes the parent level.
But does “sport/#” also match “sport/”? The spec leaves this totally ambiguous.
As an aside, who else thinks allowing trailing slashes was a really bad design decision?
The # matches zero or more further elements so subscriptions to sport/# will match sport/
This is easily tested with mosquitto_sub/mosquitto_pub
Publish:
$ mosquitto_pub -t "sport/" -m "foo"
Subscription:
$ mosquitto_sub -v -t "sport/#"
sport/ foo
Yes, sport/# matches sport/. The confusing aspect of the spec is that unlike file and directory names, topic levels can be empty strings.
The spec says:
4.7.1.1 Topic level separator
The forward slash (‘/’ U+002F) is used to separate each level within a topic tree and provide a hierarchical structure to the Topic Names... Adjacent Topic level separators indicate a zero length topic level.
This means that sport/ is parsed as not one, but two topic levels -- sport and the empty string -- hence the # in sport/# matches the second empty string topic level. This is the same reason that sport/+ matches sport/ -- the + is matching the empty string level.
Related
I'm trying to do a word search with regex and wonder how to type AND for multiple criteria.
For example, how to type the following:
(Start with a) AND (Contains p) AND (Ends with e), such as the word apple?
Input
apple
pineapple
avocado
Code
grep -E "regex expression here" input.txt
Desired output
apple
What should the regex expression be?
In general you can't implement and in a regexp (but you can implement then with .*) but you can in a multi-regexp condition using a tool that supports it.
To address the case of ands, you should have made your example starts with a and includes p and includes l and ends with e with input including alpine so it wasn't trivial to express in a regexp by just putting .*s in between characters but is trivial in a multi-regexp condition:
$ cat file
apple
pineapple
avocado
alpine
Using &&s will find both words regardless of the order of p and l as desired:
$ awk '/^a/ && /p/ && /l/ && /e$/' file
apple
alpine
but, as you can see, you can't just use .*s to implement and:
$ grep '^a.*p.*l.*e$' file
apple
If you had to use a single regexp then you'd have to do something like:
$ grep -E '^a.*(p.*l|l.*p).*e$' file
apple
alpine
two ways you can do it
all that "&&" is same as negating the totality of a bunch of OR's "||", so you can write the reverse of what you want.
at a single bit-level, AND is same as multiplication of the bits, which means, instead of doing all the && if u think it's overly verbose, you can directly "multiply" the patterns together :
awk '/^a/ * /p/ * /e$/'
so by multiplying them, you're doing the same as performing multiple logical ANDs all at once
(but only use the short hand if inputs aren't too gigantic, or when savings from early exit are known to be negligible.
don't think of them as merely regex patterns - it's easier for one to think of anything not inside an action block, what's typically referred to as pattern, as
any combination and collection of items that could be evaluated for a boolean outcome of TRUE or FALSE in the end
e.g. POSIX-compliant expressions that work in the space include
sprintf()
field assignments, etc
(even decrementing NR - if there's such a need)
but not
statements like next, print, printf(),
delete array etc, or any of the loop structures
surprisingly though, getline is directly doable
in the pattern space area (with some wrapper workaround)
This might be related to me not understanding the Keyword Extraction feature, which from the docs seems to be about avoiding an issue where no space exists between a keyword and the following expression. But say I have a fairly standard identifier regex for variable names, function names, etc.:
/\w*[A-Za-z]\w*/
How do I keep this from matching a reserved keyword like IF or ELSE or something like that? So this expression would produce an error:
int IF = 5;
while this would not:
int x = 5;
There is a pull request pending since 2019 to add an EXCLUDE feature, but this is not currently implemented as of time of writing this (April 2021 - if some time has passed and you're reading this, please do re-check this!). And since treesitter also does not support negative lookbehind in its regular expressions, this has to be handled at the semantic level. One thing you can do to make this check easier is to enumerate all your reserved words then add them as an alternative to your identifier regex:
keyword: $ => choice('IF', 'THEN', 'ELSE'),
name: $ => /\w*[A-Za-z]\w*/,
identifier: $ => choice($.keyword, $.name)
According to rule 4 of treesitter's match rules, in the expression int IF = 5; the IF token would match (identifier keyword) rather than (identifier name) since it is a more specific match. This means you can do an easy query for illegal (identifier keyword) nodes and surface the error to the user in your language server or from wherever it is you're using the treesitter grammar.
Note that this approach does run the risk of creating many conflicts between your (identifier keyword) match and the actual language constructs that use those keywords. If so, you'll have to handle the whole thing at the semantic level: scan all identifiers to check whether they're a reserved word.
I am working on a flex parser using flex 2.6.4 with the -s option specified, a particular start condition has the following patterns (I am trying to read everything to the next unescaped newline):
\\(.|\n)
[^\\\n]+
\n
Yet I get the warning: "-s option given but default rule can be matched"
I don't see any holes in the above pattern set, am I missing something or is this a flex error?
Your set of rules does not match a backslash at the end of the file.
Your first rule requires the backslash to be followed by something and the other ones don't match backslashes at all.
In my data set, we have a multitude of emails that must be parsed (alongside a myriad of other unrelated information like phone numbers and addresses and such.)
I am attempting to look for something that meets the criteria of an email, but does not have the proper format of an email. So, I tried using grep's "AND" function, wherein it fits the second parameter but not the first.
grep -E -c -v "^[a-mA-M][a-zA-Z]*\.#[A-Za-z]+\.[A-Za-z]{2,6}"Data.bash | grep # Data.bash
How should I be implementing this? As this just finds anything with an # in it (as the first parameter returns 0 and the second is just finding everything with an # in it).
In short, How do I AND two conditions together in grep?
EDIT: Sample Data
An email address has a user-id and domain names can consist of letters, numbers,
periods, and dashes.
Matches:
saltypickle#gmail.com
saltypickle#g-mail.com
No Match:
saltypickle#g^mail.com
saltypickle#.
#saltyPickle#
saltyPickle#
grep -P '^\w+#[[:alnum:]-.]+.com' inputfile
saltypickle#gmail.com
saltypickle#g-mail.com
This will allow any alpha ,number, - or . as domain name.
Following will print invalid email addresses:
grep -vP '^\w+#[[:alnum:]-.]+.com' inputfile
saltypickle#g^mail.com
saltypickle#.
#saltyPickle#
saltyPickle#
I have user information coming in from an outside source and I need to check if that user is active. Sometimes I have a User and a Server and other times I have User#Server. The former case is no problem, I just have:
active(User, Server) ->
do whatever.
What I would like to do with the User#Server case is something like:
active([User, "#", Server]) ->
active(User, Server).
Doesn't seem to work. When calling active in the erlang terminal with a#b for example, I get an error that there is no match. Any help would be appreciated!
You can tokenize the string to get the result:
active(UserString) ->
[User,Server] = string:tokens(UserString,"#"),
active(User,Server).
If you need something more elaborate, or with better handling of something like email addresses, it might then be time to delve into using regular expressions with the re module.
active(UserString) ->
RegEx = "^([\\w\\.-]+)#([\\w\\.-]+)$",
{match, [User,Server]} = re:run(UserString,RegEx,[{capture,all_but_first,list}]),
active(User,Server).
Note: The supplied Regex is hardly sufficient for email address validation, it's just an example that allows all alphanumeric characters including underscores (\\w), dots (\\.), and dashes (-) seperated by an at symbol. And it will fail if the match doesn't stretch the whole length of the string: (^ to $).
A note on the pattern matching, for the real solution to your problem I think #chops suggestions should be used.
When matching patterns against strings I think it's useful to keep in mind that erlang strings are really lists of integers. So the string "#" is actually the same as [64] (64 being the ascii code for #)
This means that you match pattern [User, "#", Server] will match lists like: [97,[64],98], but not "a#b" (which in list form is [97,64,98]).
To match the string you need to do [User,$#,Server]. The $ operator gives you the ascii value of the character.
However this match pattern limits the matching string to be 1 character followed by # and then one more character...
It can be improved by doing [User, $# | Server] which allows the server part to have arbitrary length, but the User variable will still only match one single character (and I don't see a way around that).