Cypher/Neo4j: Convert any to string (due to application of REVERSE) - neo4j

I would like to reduce a list into a string to adhere to a specific output format which requires a pipe ( '|' ) between the elements. I do it as follows:
WITH ["three", "two", "one"] AS a RETURN reduce(acc=head(a), s in tail(a) | acc + "|" + s)
My issue arises by the fact that the array has the wrong order: You see that it "counts" descending while I'd like to have it ascending (in my production environment the array is an intermediate result of a graph query, of course).
So I thought I would just do
WITH ["three", "two", "one"] AS a RETURN reduce(acc=head(a), s in REVERSE(tail(a)) | acc + "|" + s)
Unfortunately, reverse seems to return a collection of some generic type (any) which is not accepted by the string concatenation operator:
Type mismatch: expected Float, Integer, String or List<String> but was Any (line 1, column 98 (offset: 97))
"WITH ["three", "two", "one"] AS a RETURN reduce(acc=head(a), s in reverse(tail(a)) | acc + "|" + s)"
^
Thus I'd like to convert the 's' to a string via toString. This function, however, will only accept integer, float or boolean values and not any.
What can I do? I would also accept a solution without the conversion. I just want to be able reduce a reversed collections of strings into a single string.
Thank you!

You can avoid using the REVERSE() function by simply reversing the order in which you concatenate (i.e., using s + "|" + acc instead of acc + "|" + s):
WITH ["three", "two", "one"] AS a
RETURN REDUCE(acc=HEAD(a), s in TAIL(a) | s + "|" + acc )

Related

Check List1 contain all item in list2 with same list1 order in neo4j?

for example i have below array
Array1 = ["d","c","b"]
Array2= ["d","c","x"]
Array3= ["a","C"]
Array4= ["a","d","c","b","x"]
I want to compare Array1,2,3 to Array4 and the result should be Array1 because in contains d,c,b with right arrange in Array4.
The simpliest way to do it is to transform your array as a string, and then performs a CONTAINS where clause :
This is an example :
WITH ["d","c","b"] AS test, ["a","d","c","b","x"] AS array
WITH test, reduce(v='|', x in test | v + x + '|') as testAsString, reduce(v='|', x in array | v + x + '|') as arrayAsString
WHERE arrayAsString CONTAINS testAsString
RETURN test
Note that if in your array values you have a | this code can give you some false positive.

How to write grammar for an expression when it can have many possible forms

I have some sentences that I need to convert to regex code and I was trying to use Pyparsing for it. The sentences are basically search rules, telling us what to search for.
Examples of sentences -
LINE_CONTAINS this is a phrase
-this is an example search rule telling that the line you are searching on should have the phrase this is a phrase
LINE_STARTSWITH However we - this is an example search rule telling that the line you are searching on should start with the phrase However we
The rules can be combined too, like- LINE_CONTAINS phrase one BEFORE {phrase2 AND phrase3} AND LINE_STARTSWITH However we
A list of all actual sentences (if necessary) can be found here.
All lines start with either of the 2 symbols mentioned above (call them line_directives). Now, I am trying to parse these sentences and then convert them to regex code. I started writing a BNF for my grammar and this is what I came up with -
lpar ::= '{'
rpar ::= '}'
line_directive ::= LINE_CONTAINS | LINE_STARTSWITH
phrase ::= lpar(?) + (word+) + rpar(?) # meaning if a phrase is parenthesized, its still the same
upto_N_words ::= lpar + 'UPTO' + num + 'WORDS' + rpar
N_words ::= lpar + num + 'WORDS' + rpar
upto_N_characters ::= lpar + 'UPTO' + num + 'CHARACTERS' + rpar
N_characters ::= lpar + num + 'CHARACTERS' + rpar
JOIN_phrase ::= phrase + JOIN + phrase
AND_phrase ::= phrase (+ JOIN + phrase)+
OR_phrase ::= phrase (+ OR + phrase)+
BEFORE_phrase ::= phrase (+ BEFORE + phrase)+
AFTER_phrase ::= phrase (+ AFTER + phrase)+
braced_OR_phrase ::= lpar + OR_phrase + rpar
braced_AND_phrase ::= lpar + AND_phrase + rpar
braced_BEFORE_phrase ::= lpar + BEFORE_phrase + rpar
braced_AFTER_phrase ::= lpar + AFTER_phrase + rpar
braced_JOIN_phrase ::= lpar + JOIN_phrase + rpar
rule ::= line_directive + subrule
final_expr ::= rule (+ AND/OR + rule)+
The problem is the subrule, for which (based on the empirical data I have) I have been able to come up with all of the following expressions -
subrule ::= phrase
::= OR_phrase
::= JOIN_phrase
::= BEFORE_phrase
::= AFTER_phrase
::= AND_phrase
::= phrase + upto_N_words + phrase
::= braced_OR_phrase + phrase
::= phrase + braced_OR_phrase
::= phrase + braced_OR_phrase + phrase
::= phrase + upto_N_words + braced_OR_phrase
::= phrase + upto_N_characters + phrase
::= braced_OR_phrase + phrase + upto_N_words + phrase
::= phrase + braced_OR_phrase + upto_N_words + phrase
To give an example, one sentence I have is LINE_CONTAINS the objective of this study was {to identify OR identifying} genes upregulated. For this the subrule as mentioned above is phrase + braced_OR_phrase + phrase.
So my question is how do I write a simple BNF grammar expression for the subrule so that I would be able to easily code the grammar for it using Pyparsing? Also, any input regarding my present technique is absolutely welcome.
EDIT: After applying the principles elucidated by #Paul in his answer, here is the MCVE version of the code. It takes a list of sentences to be parsed hrrsents, parses each sentence, converts it to it's corresponding regex and returns a list of regex strings -
from pyparsing import *
import re
def parse_hrr(hrrsents):
UPTO, AND, OR, WORDS, CHARACTERS = map(Literal, "UPTO AND OR WORDS CHARACTERS".split())
LBRACE,RBRACE = map(Suppress, "{}")
integer = pyparsing_common.integer()
LINE_CONTAINS, PARA_STARTSWITH, LINE_ENDSWITH = map(Literal,
"""LINE_CONTAINS PARA_STARTSWITH LINE_ENDSWITH""".split()) # put option for LINE_ENDSWITH. Users may use, I don't presently
BEFORE, AFTER, JOIN = map(Literal, "BEFORE AFTER JOIN".split())
keyword = UPTO | WORDS | AND | OR | BEFORE | AFTER | JOIN | LINE_CONTAINS | PARA_STARTSWITH
class Node(object):
def __init__(self, tokens):
self.tokens = tokens
def generate(self):
pass
class LiteralNode(Node):
def generate(self):
return "(%s)" %(re.escape(''.join(self.tokens[0]))) # here, merged the elements, so that re.escape does not have to do an escape for the entire list
class ConsecutivePhrases(Node):
def generate(self):
join_these=[]
tokens = self.tokens[0]
for t in tokens:
tg = t.generate()
join_these.append(tg)
seq = []
for word in join_these[:-1]:
if (r"(([\w]+\s*)" in word) or (r"((\w){0," in word): #or if the first part of the regex in word:
seq.append(word + "")
else:
seq.append(word + "\s+")
seq.append(join_these[-1])
result = "".join(seq)
return result
class AndNode(Node):
def generate(self):
tokens = self.tokens[0]
join_these=[]
for t in tokens[::2]:
tg = t.generate()
tg_mod = tg[0]+r'?=.*\b'+tg[1:][:-1]+r'\b)' # to place the regex commands at the right place
join_these.append(tg_mod)
joined = ''.join(ele for ele in join_these)
full = '('+ joined+')'
return full
class OrNode(Node):
def generate(self):
tokens = self.tokens[0]
joined = '|'.join(t.generate() for t in tokens[::2])
full = '('+ joined+')'
return full
class LineTermNode(Node):
def generate(self):
tokens = self.tokens[0]
ret = ''
dir_phr_map = {
'LINE_CONTAINS': lambda a: r"((?:(?<=^)|(?<=[\W_]))" + a + r"(?=[\W_]|$))456",
'PARA_STARTSWITH':
lambda a: ( r"(^" + a + r"(?=[\W_]|$))457") if 'gene' in repr(a)
else (r"(^" + a + r"(?=[\W_]|$))458")}
for line_dir, phr_term in zip(tokens[0::2], tokens[1::2]):
ret = dir_phr_map[line_dir](phr_term.generate())
return ret
class LineAndNode(Node):
def generate(self):
tokens = self.tokens[0]
return '&&&'.join(t.generate() for t in tokens[::2])
class LineOrNode(Node):
def generate(self):
tokens = self.tokens[0]
return '###'.join(t.generate() for t in tokens[::2])
class UpToWordsNode(Node):
def generate(self):
tokens = self.tokens[0]
ret = ''
word_re = r"([\w]+\s*)"
for op, operand in zip(tokens[1::2], tokens[2::2]):
# op contains the parsed "upto" expression
ret += "(%s{0,%d})" % (word_re, op)
return ret
class UpToCharactersNode(Node):
def generate(self):
tokens = self.tokens[0]
ret = ''
char_re = r"\w"
for op, operand in zip(tokens[1::2], tokens[2::2]):
# op contains the parsed "upto" expression
ret += "((%s){0,%d})" % (char_re, op)
return ret
class BeforeAfterJoinNode(Node):
def generate(self):
tokens = self.tokens[0]
operator_opn_map = {'BEFORE': lambda a,b: a + '.*?' + b, 'AFTER': lambda a,b: b + '.*?' + a, 'JOIN': lambda a,b: a + '[- ]?' + b}
ret = tokens[0].generate()
for operator, operand in zip(tokens[1::2], tokens[2::2]):
ret = operator_opn_map[operator](ret, operand.generate()) # this is basically calling a dict element, and every such element requires 2 variables (a&b), so providing them as ret and op.generate
return ret
## THE GRAMMAR
word = ~keyword + Word(alphas, alphanums+'-_+/()')
uptowords_expr = Group(LBRACE + UPTO + integer("numberofwords") + WORDS + RBRACE).setParseAction(UpToWordsNode)
uptochars_expr = Group(LBRACE + UPTO + integer("numberofchars") + CHARACTERS + RBRACE).setParseAction(UpToCharactersNode)
some_words = OneOrMore(word).setParseAction(' '.join, LiteralNode)
phrase_item = some_words | uptowords_expr | uptochars_expr
phrase_expr = infixNotation(phrase_item,
[
((BEFORE | AFTER | JOIN), 2, opAssoc.LEFT, BeforeAfterJoinNode), # was not working earlier, because BEFORE etc. were not keywords, and hence parsed as words
(None, 2, opAssoc.LEFT, ConsecutivePhrases),
(AND, 2, opAssoc.LEFT, AndNode),
(OR, 2, opAssoc.LEFT, OrNode),
],
lpar=Suppress('{'), rpar=Suppress('}')
) # structure of a single phrase with its operators
line_term = Group((LINE_CONTAINS|PARA_STARTSWITH)("line_directive") +
(phrase_expr)("phrases")) # basically giving structure to a single sub-rule having line-term and phrase
#
line_contents_expr = infixNotation(line_term.setParseAction(LineTermNode),
[(AND, 2, opAssoc.LEFT, LineAndNode),
(OR, 2, opAssoc.LEFT, LineOrNode),
]
) # grammar for the entire rule/sentence
######################################
mrrlist=[]
for t in hrrsents:
t = t.strip()
if not t:
continue
try:
parsed = line_contents_expr.parseString(t)
except ParseException as pe:
print(' '*pe.loc + '^')
print(pe)
continue
temp_regex = parsed[0].generate()
final_regexes3 = re.sub(r'gene','%s',temp_regex) # this can be made more precise by putting a condition of [non-word/^/$] around the 'gene'
mrrlist.append(final_regexes3)
return(mrrlist)
You have a two-tiered grammar here, so you would do best to focus on one tier at a time, which we have covered in some of your other questions. The lower tier is that of the phrase_expr, which will later serve as the argument to the line_directive_expr. So define examples of phrase expressions first - extract them from your list of complete statement samples. Your finished BNF for phrase_expr will have the lowest level of recursion look like:
phrase_atom ::= <one or more types of terminal items, like words of characters
or quoted strings, or *possibly* expressions of numbers of
words or characters> | brace + phrase_expr + brace`
(Some other questions: Is it possible to have multiple phrase_items one after another with no operator? What does that indicate? How should it be parsed? interpreted? Should this implied operation be its own level of precedence?)
That will be sufficient to loop back the recursion for your phrase expression - you should not need any other braced_xxx element in your BNF. AND, OR, and JOIN are clearly binary operators - in normal operation precedence, AND's are evaluated before OR's, you can decide for yourself where JOIN should fall in this. Write some sample phrases with no parentheses, with AND and JOIN, and OR and JOIN, and think through what order of evaluation makes sense in your domain.
Once that is done, then line_directive_expr should be simple, since it is just:
line_directive_item ::= line_directive phrase_expr | brace line_directive_expr brace
line_directive_and ::= line_directive_item (AND line_directive_item)*
line_directive_or ::= line_directive_and (OR line_directive_and)*
line_directive_expr ::= line_directive_or
Then when you translate to pyparsing, add Groups and results names a little at a time! Don't immediately Group everything or name everything. Ordinarily I recommend using results names liberally, but in infix notation grammars, lots of results names can just clutter up the results. Let the Group (and ultimately node classes) do the structuring, and the behavior in the node classes will guide you where you want results names. For that matter, the results classes usually get such a simple structure that it is often easier just to do list unpacking in the class init or evaluate methods. Start with simple expressions and work up to complicated ones. (Look at "LINE_STARTSWITH gene" - it is one of your simplest test cases, but you have it as #97?) If you just sort this list by length order, that would be a good rough cut. Or sort by increasing number of operators. But tackling the complex cases before you have the simple ones working, you will have too many options on where a tweak or refinement should go, and (speaking from personal experience) you are as likely to get it wrong as get it right - except when you get it wrong, it just makes fixing the next issue more difficult.
And again, as we have discussed elsewhere, the devil in this second tier is doing the actual interpretation of the various line directive items, since there is an implied order to evaluating LINE_STARTSWITH vs LINE_CONTAINS that overrides the order that they may be found in the initial string. That ball is entirely in your court, since you are the language designer for this particular domain.

Neo4j Cypher : replace multiple characters

i need to replace multiple characters by single character
RETURN LOWER(REPLACE("ranchod-das-chanchad-240190---Funshuk--Wangdu",'--', '-'))
is there any regex to do this
for neo4j 2.2.2
There's no function similar to REPLACE taking a regex as a parameter.
Since you're using Neo4j 2.2, you can't implement it as a procedure either.
The only way to do it is by splitting and joining (using a combination of reduce and substring):
RETURN substring(reduce(s = '', e IN filter(e IN split('ranchod-das-chanchad-240190---Funshuk--Wangdu', '-') WHERE e <> '') | s + '-' + e), 1);
It can be easier to read if you decompose it:
WITH split('ranchod-das-chanchad-240190---Funshuk--Wangdu', '-') AS elems
WITH filter(e IN elems WHERE e <> '') AS elems
RETURN substring(reduce(s = '', e IN elems | s + '-' + e), 1);

FSharp.Data: Transform multiple columns to a single column (dictionary result)

I am using FSharp.Data to transform HTML table data, i.e.
type RawResults = HtmlProvider<url>
let results = RawResults.Load(url).Tables
for row in results.Table1.Rows do
printfn " %A " row
Example output:
("Model: Generic", "Submit Date: July 22, 2016")
("Gene: Sequencing Failed", "Exectime: 5 hrs. 21 min.")
~~~ hundreds of more rows ~~~~
I am trying to split those "two column"-based elements into a single column sequence to eventually get to a dictionary result.
Desired dictionary key:value result:
["Model", Generic]
["Submit Date", July 22, 2016]
["Gene", "Sequencing Failed"]
~~~~
How can you iter (or split?) the two columns (Column1 & Column2) to pipe both of those individual columns to produce a dictionary result?
let summaryDict =
results.Table1.Rows
|> Seq.skip 1
|> Seq.iter (fun x -> x.Column1 ......
|> ....
Use the built-in string API to split over the :. I usually prefer to wrap String.Split in curried form:
let split (separator : string) (s : string) = s.Split (separator.ToCharArray ())
Additionally, while not required, when working with two-element tuples, I often find it useful to define a helper module with functions related to this particular data structure. You can put various functions in such a module (e.g. curry, uncurry, swap, etcetera), but in this case, a single function is all you need:
module Tuple2 =
let mapBoth f g (x, y) = f x, g y
With these building blocks, you can easily split each tuple element over :, as shown in this FSI session:
> [
("Model: Generic", "Submit Date: July 22, 2016")
("Gene: Sequencing Failed", "Exectime: 5 hrs. 21 min.") ]
|> List.map (Tuple2.mapBoth (split ":") (split ":"));;
val it : (string [] * string []) list =
[([|"Model"; " Generic"|], [|"Submit Date"; " July 22, 2016"|]);
([|"Gene"; " Sequencing Failed"|], [|"Exectime"; " 5 hrs. 21 min."|])]
At this point, you still need to strip leading whitespace, as well as convert the arrays into your desired format, but I trust you can take it from here (otherwise, please ask).

f# finding the indexes of a string in a char array

Hey all i am new to F#
I am trying to find the starting indexes of all the occurrences of a string in a char array.
e.g.
char array ['a';'b';'b';'a';'b';'b';'b';'b';'b';'a';'b']
would return 0 and 3 and 9 if you were searching for the string "ab"
Here's a solution using recursive functions:
/// Wraps the recursive findMatches function defined inside, so that you don't have to seed it with the "internal" paramters
let findMatches chars str =
/// Returns whether or not the string matches the beginning of the character array
let rec isStartMatch chars (str: string) =
match chars with
| char :: rest when str.Length > 0 ->
char = str.[0] && (isStartMatch rest str.[1..(str.Length - 1)])
| _ -> str.Length = 0
/// The actual function here
let rec findMatches matchedIndices i chars str =
match chars with
| _ :: rest ->
if isStartMatch chars str
then findMatches (i :: matchedIndices) (i + 1) rest str
else findMatches matchedIndices (i + 1) rest str
| [] -> matchedIndices
findMatches [] 0 chars str
Not the most efficient as it iterates over characters twice if they're part of a match, but that's not really a big worry.
I don't want to do a complete example here, so here is the hint:
let rec match (l:char seq) i=
match seq.tryFindindex ... (*the part you have already done goes here*)with
|None -> []
|Some(t) ->i+t::(match (Seq.skip t l) (i+t)
Basically, we just repeatedly apply the Findindex until it stops matching.

Resources