What is the name of this feature in cypher that allows depth matching rules in a path match statement? Where is the documentation? - neo4j

I am having a hard time identifying the rules around this syntax, I see it and kind of understand it, but would like to find the documentation. I'm not sure how to google it though.
match (g:Group)<-[*1..2]-(s)
^^^^
^^^^ I would like to know more about this rule that limits path length
I understand that the above says "I only want to find paths that are between 1 and 2 edge traversals long", and it it really is that simple that's great; but I as I understand it the use of a star makes it a wildcard on which edges it can follow, while [:EdgeTypeIWant1..2] doesn't appear to be correct syntax. I probably have other questions as well that proper documentation (if I could find it) would be helpful with.

They are called variable length patterns.
The star is only an indicator that you're specifying a pattern of variable length, it's not a wildcard.
Your syntax would be: [:EdgeTypeIWant*1..2]

Related

Understanding 'strictness' in regex grammar

In writing a grammar/parser for a regex, I'm wondering why the following constructions are both syntactically and semantically valid in the regex syntax (at least as far as I can understand it)?
Repetitions of a character class, such as:
Repetitions of a zero-width assertion, such as:
Assertions at a bogus position, such as:
To me, this sort of seems like having a syntax that would allow having a construction like SELECT SELECT SELECT SELECT SELECT col FROM tbl. In other words, why isn't the regex syntax defined as more strict than it is in practice?
To start with, that's not a very good analogy. Your statement with multiple SELECT keywords is not, as far as I know, part of the SQL grammar, so it's simply ungrammatical. Repeated elements in a character class are more like the SQL construct:
SELECT * FROM table WHERE Value IN (1, 2, 3, 2, 3, 2, 3)
I think most (if not all) SQL processors would allow that. You could argue that it would be nice if a warning message were issued, but the usual SQL interface (where a query is sent from client to server and a result returned) does not leave space for a warning.
It's certainly the case that repeated characters in a character class are often an indication that the regular expression was written by a novice with only a fuzzy idea of what a character class is. If you hang out in SO's flex-lexer long enough, you'll see how often students write regular expressions like [a-z|A-Z|0-9], or even [begin|end]. If flex detected duplicate characters in character classes, those mistakes would receive warnings, which might or might not be useful to the student coder. (Reading and understanding warning messages is not, apparently, an innate skill.) But it needs to be asked, Who is the target audience for a tool like Flex, and I think the answer is not "impatient beginners who won't read documentation". Certainly, a non-novice programmer might also make that kind of mistake, usually as a result of a typo, but it's not common and it will probably easily be detected during debugging.
If you've already started to work on a parser for regular expressions, you should already know why these rules are not usually a feature of regular expression libraries. Every rule you place on a syntax must be:
precisely defined,
documented,
implemented,
acted upon appropriately (which may mean complicating the interface in order to allow for warnings).
and all of that needs to be thoroughly tested.
That's a lot of work to prevent something which is not even technically an error. And catching those errors (if they are errors) will probably make the "grammar" for the regular expressions context-sensitive. (In other words, you can't write a context-free grammar which forbids duplication inside a character class.)
Moreover, practically all of those expressions might show up in the wild, typically in the case that the regular expression was programmatically generated. For example, suppose you have a set of words, and you want to write a regular expression which will match any sequence of characters not in any of the words. The simple solution is [^firstsecondthirdwordandtherest]+. Of course, you could have gone to the trouble of deduping the individual letters in the words, -- a simple task in some programming languages, but more complicated in others -- but should it be necessary?
With respect to your other two examples, with repeated and non-terminal $, there are actual regex libraries in which those interpreted in some different way. Many older regex libraries (including (f)lex) only treat $ as a zero-length end-of-string assertion if it appears at the very end of the regex; such libraries would treat all of those $s other than the last one in a$$$$$$$$ as matching a literal $. Others treat $ as an end-of-line assertion, rather than an end-of-string assertion. But I don't know of any which treat those as errors.
There's a pretty good argument that a didactic tool, such as regex101, might usefully issue warnings for doubtful regular expressions, a task usually called "linting". But for a production regex library, there is little to be gained, and a lot of pain.

(wx)Maxima: determining the number of parts of an expression

I'd like to use part to handle expressions of different length but have not been able to find anything in the documentation that addresses how to determine the number of parts of an expression.
I do have an upper bound for the number of parts so, in this particular case, I could loop over the terms until I get an error; however, I was wondering if there is a more direct method?
I can't believe this, but it appears length does the trick.
I assumed it was limited to lists because it returns an error for single numbers, because they are atoms. Apparently my brain decided "doesn't work with atoms" to mean "only work with lists."
However, this does mean that neither part nor length will work if the expression only has one part, so that case has to be handled separately.
I would still be interested in knowing if there is a solution that will work in all cases, so I won't mark this as the answer, just yet.

How to match efficiently against keys in a table in Lua?

Available in my Lua 5.1 environment are obviously the default Lua pattern matching, but also a reasonably recent version of PCRE and LPEG. I don't honestly care which of these is used; as long as my problem is tackled in an efficient manner I'm happy. (My personal knowledge of LPEG especially is next to non-existent, but I hear it has some very good qualities.)
I have a table with certain string patterns as keys, the accompanying values are to be used once the keys matches... which means they aren't really important for this matter.
Suppose you have:
tbl = { ["aaa"] = 12, ["aab"] = 452, ["aba"] = -2 }
Now my goal is to find out which one of these matches first in a particular string like "accaccaacaadacaabacdaaba".
In reality, the keys are more numerous and the match string is considerably lengthier. This means simply matching against all keys one by one and compare the column the match begins at is a very inefficient solution that is not viable for me.
Parts of the match strings can have considerable overlaps, too. From the theory, I know one state machine per key pattern would be ideal in this regard; just go through the motions on every pattern and the moment you have a complete match on one of them you are done.
But I would be crazy to go code something like that myself when there's so many pattern matching libraries in my environment. The only one I know is technically capable is PCRE; just append the keys like "aaa|aab|aba" and you'll get the first feasible match.
But there's also the problem. For one, I am unsure how intelligent it is when compiling such a match. (I think it first tries 'aaa', unwinds completely once it fails, then completely tries aab, but I haven't tested) which wouldn't be too efficient compared to matching it like "a(a[ab]|ba)" where similarities get resolved faster.
Additionally, I'd like to have the capacity to put in some flexibility ("a.ad" where the second character doesn't matter, or matches a number.. basic stuff like that). With a pattern like that in such an additive approach, I do not see a way to regain the original pattern that matched so I can use the value that goes with it.
(Worst case, I could just generate a lot of entries in the table to match every possible wildcard variation and do away with the pattern requirement, but I honestly don't want to.)
Which library is the right tool for the job, and to boot, how to best use said library to achieve above-stated goals without reinventing the wheel?
A comment to your question mentioned Aho–Corasick algorithm.
If your environment has access to os.execute or io.popen, you can call fgrep -o -f patterns filename, where patterns is the name of a file that contains patterns separated with newlines, and filename is the name of your input. -o means that only matches will be output, one per line. You can replace filename with - so that fgrep reads from standard input: echo "String to match" | fgrep -o -f patterns.
fgrep implements Aho–Corasick algorithm.
However, remember that Aho–Corasick algorithm does not recognise metacharacters.
Just as Alexander Mashin's answer said, Aho–Corasick algorithm is an efficient algorithm that will solve your problem. In Lua land, cloudflare /
lua-aho-corasick is an implementation for LuaJIT using FFI. There's also a pure lua implemetation jgrahamc/aho-corasick-lua which might be slower.

How can I determine if two mathematical expressions are 'sort of' the same?

Sorry about the vagueness of this question, and maybe I want a different exchange? Not sure. Anyway, here goes:
I'd like to be able to determine if two mathematical expressions are 'the same'. I don't need full equivalence testing, which I understand is impossible anyway. Basically, I'd like to make a function that looks like this:
areTheSame(expression1, expression2, [testing methods])
where [testing methods] might include: 'exact', 'allow commutativity', 'allow distribution', ...
'exact' would be easy: expression1 == expression2 if their strings
are exactly equal
'allow commutativity' would be more difficult. For instance, if
expression1 is y=3*x and expression2 is y=x*3, then under 'allow
commutativity', they would be the same. Ditto here for 'y=x' and 'x=y'.
'allow distribution' would allow 'y=2*(x-3)' to be equal to
'y=2*x-6'.
others?
Ideally, I'd like to be lazy! I'd love to find some library that
supports parsing expressions from latex representations (or MathML, which is xml already, maybe it's easier to parse)
supports equivalence testing with some flags or something that
govern how exact the comparison should be, as above.
is written in c or c++ (or Objective-C - this is for an iOS project)
is not GPL.
4 rules out SymbolicC++ and GiNaC afaik. Mathomatic is LGPL, which I'm not sure about in the context of apple's app-Store (and I would really not like to have to give out object files)
Any ideas? Thanks!
This may not be an answer, but it's too long to be a comment. :)
My first thought: this is not an easy or common problem, so you probably won't have much luck finding a library to do it for you. Finding a library that does some of the non-critical parts, on the other hand, shouldn't be difficult.
What you probably want to do is parse the expressions to create an abstract syntax tree (you'll probably find lots of libraries for that), then recursively analyze the AST yourself to test whatever definition of sameness you're after.
In iOS, a decent place to start might be (horribly ab)using NSExpression and NSPredicate. These have constructor methods that parse a string and return a structure of expression and predicate objects.
Recursively walk that structure. For each predicate, check to see if the predicateOperatorType matches... if it doesn't, the predicates aren't the same. If it does, look at the left predicate's leftExpression and the right predicate's leftExpression. Each expression has a function that tells you what its operator is (add, subtract, etc). If they don't match, the expressions aren't the same. (Do the same check for the other side.) If they do, recurse: look at each expression's sub-expressions and do a similar check, and so on until you get to expressions that are constant values or variables.
That's a rough sketch of how to see if two predicates (and the expressions they contain) "match". For "sort of the same", just relax each check you perform while recursively walking the tree and/or add more checks; e.g. if you get to an expression whose function is add, check for commutativity by comparing its sub-expressions to the corresponding ones in the other predicate in either order. (Also, there are probably other libraries that'll parse basic math expressions and get you an AST you can walk however you like.)
That still won't get you everything you're after — "allow distribution" gets you into the realm of full-fledged CAS software. Maybe look into whether the likes of Wolfram Alpha have web service APIs?

Simple search alternatives for Ruby

I'm looking for a simple way to generate 'Did you mean ...' style search tips when a search over the title of a record doesn't hit on a substring match because of slightly different punctuation or phrasing for a Rails 3 app.
Most commonly, I want to generate hits for 'Alpha: Beta' when the user searches for 'Alpha Beta', 'Alpha & Beta' for 'Alpha and Beta' and 'Alpha Beta' for 'The Alpha Beta' e.g.. The same goes for the opposite direction for the first two examples, because my current substring searching will catch the latter case already. I would prefer to do this without specific logic for each of the above examples though, as there may be other variants I can't think of right now.
I'd also prefer to shy away from a solution that requires me to popular a hidden field of the record with alternate spellings as records are generated, which is then searched over instead of the publicly displayed one.
I'm guessing that a proper full text search like Sphinx/Thinking Sphinx would accomplish this, but I want to check if there's an easier solution for my limited scope problem. Ideally something that automatically generated this hidden field by striping out common words like 'the', 'and' and punctuation like '&' and ':' from both the record title and search term and the title field and then does the search. The actual order of the remaining words needn't necessarily have to match when juggle around ('Alpha Beta Gamma' can match 'Alpha, Beta, Gamma' but not 'Alpha, Gamma, Beta').
This solution doesn't meet all of your requirements, but I believe it's close enough to be worth mentioning - the excellent "scoped_search" gem, available at https://github.com/wvanbergen/scoped_search
It implements a simple query language where a search for 'alpha beta' matches results containing all those words, rather than the exact phrase - see the wiki at https://github.com/wvanbergen/scoped_search/wiki/query-language for more information on what it supports.
It generates SQL queries behind the scenes, so doesn't require a separate search daemon like Sphinx.
However, I don't believe it does anything similar to stripping out common words. Perhaps you could get some mileage by manually stripping out your common words, and then getting scoped_search to search for your revised term?

Resources