Make a path where next node is not the previous node? - neo4j

I have ~1.5 M nodes in a graph, that are structured like this (picture)
I run a Cypher query that performs calculations on each relationship traversed:
WITH 1 AS startVal
MATCH x = (c:Currency)-[r:Arb*2]->(m)
WITH x, REDUCE(s = startVal, e IN r | s * e.rate) AS endVal, startVal
RETURN EXTRACT(n IN NODES(x) | n) as Exchanges,
extract ( e IN relationships(x) | startVal * e.rate) AS Rel,
endVal, endVal - startVal AS Profit
ORDER BY Profit DESC LIMIT 5
The problem is it returns the path ("One")->("hop")->("One"), which is useless for me.
How can I make it not choose the previously walked node as the next node (i.e. "One"->"hop"->"any_other_node_but_not_"one")?
I have read that NODE_RECENT should address my issue. However, there was no example on how to specify the length of recent nodes in RestAPI or APOC procedures.
Is there a Cypher query for my case?
Thank you.
P.S. I am extremely new (less than 2 month) to Neo4j and coding. So my apologies if there is an obvious simple solution.

I don't know if I understood your question completely, but I believe that you problem can be solved putting a WHERE clause on the MATCH to prevent the not desired relationship be matched, like this:
WITH 1 AS startVal
MATCH x = (c:Currency)-[r:Arb*2]->(m)
WHERE NOT (m)-[:Arb]->(c)
WITH x, REDUCE(s = startVal, e IN r | s * e.rate) AS endVal, startVal
RETURN EXTRACT(n IN NODES(x) | n) as Exchanges,
extract ( e IN relationships(x) | startVal * e.rate) AS Rel,
endVal, endVal - startVal AS Profit
ORDER BY Profit DESC LIMIT 5

Try inserting this clause after your MATCH clause, to filter out cases where c and m are the same:
WHERE c <> m
[EDITED]
That is:
WITH 1 AS startVal
MATCH x = (c:Currency)-[r:Arb*2]->(m)
WHERE c <> m
WITH x, REDUCE(s = startVal, e IN r | s * e.rate) AS endVal, startVal
RETURN EXTRACT(n IN NODES(x) | n) as Exchanges,
extract ( e IN relationships(x) | startVal * e.rate) AS Rel,
endVal, endVal - startVal AS Profit
ORDER BY Profit DESC LIMIT 5;
After using this query to create test data:
CREATE
(c:Currency {name: 'One'})-[:Arb {rate:1}]->(h:Account {name: 'hop'})-[:Arb {rate:2}]->(t:Currency {name: 'Two'}),
(t)-[:Arb {rate:3}]->(h)-[:Arb {rate:4}]->(c)
the above query produces these results:
+-----------------------------------------------------------------------------------------+
| Exchanges | Rel | endVal | Profit |
+-----------------------------------------------------------------------------------------+
| [Node[8]{name:"Two"},Node[7]{name:"hop"},Node[6]{name:"One"}] | [3,4] | 12 | 11 |
| [Node[6]{name:"One"},Node[7]{name:"hop"},Node[8]{name:"Two"}] | [1,2] | 2 | 1 |
+-----------------------------------------------------------------------------------------+

Related

extract decorating nodes if it exists but still return path if decorating nodes does not exist

I have the following graph
(y1:Y)
^
|
(a1:A) -> (b1:B) -> (c1:C)
(e1:E)
^
|
(d1:D)
^
|
(a2:A) -> (b2:B) -> (c2:C)
(a3:A) -> (b3:B) -> (c3:C)
I would like to find path between node label A and C. I can use the query
match p=((:A)-[*]->(:C))
return p
But I also want to get node label Y and node label D, E if these decorating nodes exists. If I try:
match p=((:A)-[*]->(cc:C)), (cc)-->(yy:Y), (cc)-[*]->(dd:D)-[*]->(ee:E)
return p, yy, dd, ee
Then it is only going to return the path if the C node has Y, D, E connects to it.
The output that I need is:
a1->b1->c1, y1, null
a2->b2->c2, null, [[d1, e1]]
a3->b3->c3, null, null
I.e., if decorating node does not exist, then just return null. For the array, it can be null or empty array. Also D and E nodes will be group into an array of arrays since there could be many pairs of D and E.
What is the best way to achieve this?
This should do it, returning an empty array for the deDecoration if there aren't any D-E decorations
MATCH p=((:A)-[*]->(c:C))
WITH p,
HEAD([(c)--(y:Y) | y ]) AS yDecoration,
[(c)-[*]->(d:D)-[*]->(e:E) | [d,e]] AS deDecoration
RETURN p, yDecoration, deDecoration
with this graph (multiple D-E)
this query
MATCH p=((:A)-[*]->(c:C))
WITH REDUCE(s='' , node IN nodes(p) | s + CASE WHEN s='' THEN '' ELSE '->' END + node.name) AS p,
HEAD([(c)--(y:Y) | y.name ]) AS yDecoration,
[(c)-[*]->(d:D)-[*]->(e:E) | [d.name,e.name]] AS deDecoration
RETURN p, yDecoration, deDecoration
returns
╒════════════╤═════════════╤═════════════════════════╕
│"p" │"yDecoration"│"deDecoration" │
╞════════════╪═════════════╪═════════════════════════╡
│"A2->B2->C2"│null │[] │
├────────────┼─────────────┼─────────────────────────┤
│"A1->B1->C1"│null │[["D2","E2"],["D1","E1"]]│
├────────────┼─────────────┼─────────────────────────┤
│"A3->B3->C3"│"Y1" │[] │
└────────────┴─────────────┴─────────────────────────┘

How to optimize the following neo4j Cypher query

I am new to cypher and have the below query to find mistmaches between 2 source types(for example). I believe syntactically the query looks fine but it takes 1 minute to run on data set of just 1,00,000 nodes. I am not using relations still. Can someone please help in optimizing the query? Thanks.
MATCH (VW_OXSS41:VW_OrderXStatusSummary4{SourceTypeID: "1"})
WHERE apoc.date.parse(VW_OXSS41.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss'))>=apoc.date.parse("2020-02-10",'s',('yyyy-MM-dd')) AND apoc.date.parse(VW_OXSS41.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss'))<=apoc.date.parse("2020-02-16",'s',('yyyy-MM-dd'))
WITH VW_OXSS41.IdentifierValue as X
MATCH (VW_OXSS42:VW_OrderXStatusSummary4{SourceTypeID: "2"})
WHERE apoc.date.parse(VW_OXSS42.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss'))>=apoc.date.parse("2020-02-10",'s',('yyyy-MM-dd')) AND apoc.date.parse(VW_OXSS42.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss'))<=apoc.date.parse("2020-02-16",'s',('yyyy-MM-dd'))
WITH apoc.coll.disjunction(COLLECT(X), COLLECT(VW_OXSS42.IdentifierValue)) as XX
UNWIND (XX) as YY
The updated query and the error:-
WITH apoc.date.parse("2020-02-20",'s',('yyyy-MM-dd')) AS a, apoc.date.parse("2020-02-25",'s',('yyyy-MM-dd')) AS b
MATCH (x:VW_OrderXStatusSummary4 {SourceTypeID: "2"})
WHERE a <= apoc.date.parse(x.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH a, b, COLLECT(x.IdentifierValue) AS X
MATCH (y:VW_OrderXStatusSummary4 {SourceTypeID: "1"})
WHERE a <= apoc.date.parse(y.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH X, COLLECT(y.IdentifierValue) AS Y
UNWIND apoc.coll.subtract(X,Y) AS XX
MATCH (z:VW_OrderXStatusSummary4 {SourceTypeID: "2"})
WHERE a <= apoc.date.parse(z.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
RETURN XX AS MISMATCHES,MAX(z.TimeStamp);
Variable `a` not defined (line 10, column 7 (offset: 551))
"WHERE a <= apoc.date.parse(z.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b"
Solved the above error like this:-
WITH apoc.date.parse("2020-02-21",'s',('yyyy-MM-dd')) AS a, apoc.date.parse("2020-02-25",'s',('yyyy-MM-dd')) AS b
MATCH (x:VW_OrderXStatusSummary4 {SourceTypeID: "2"})
WHERE a <= apoc.date.parse(x.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH a, b, COLLECT(x.IdentifierValue) AS X
MATCH (y:VW_OrderXStatusSummary4 {SourceTypeID: "1"})
WHERE a <= apoc.date.parse(y.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH X, COLLECT(y.IdentifierValue) AS Y
UNWIND apoc.coll.subtract(X,Y) AS XX
WITH XX, apoc.date.parse("2020-02-20",'s',('yyyy-MM-dd')) AS a, apoc.date.parse("2020-02-25",'s',('yyyy-MM-dd')) AS b
MATCH (z:VW_OrderXStatusSummary4 {SourceTypeID: "2"})
WHERE a <= apoc.date.parse(z.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
AND XX = z.IdentifierValue
RETURN XX AS MISMATCHES,MAX(z.TimeStamp);
With the correct expected output as:-
+---------------------------------------------+
| MISMATCHES | TIMESTAMP |
+---------------------------------------------+
| "W2002201453550218" | "2020-02-21 12:00:16" |
| "W2002201453550222" | "2020-02-21 12:00:16" |
| "W2002201453550223" | "2020-02-21 09:30:36" |
| "W2002201453550224" | "2020-02-21 12:00:16" |
| "W2002201453550226" | "2020-02-21 12:00:16" |
| "W2002201453550227" | "2020-02-21 12:00:16" |
| "W2002201453550237" | "2020-02-21 12:00:16" |
| "3011WOS002978598" | "2020-02-21 10:00:54" |
| "3011WOS002978595" | "2020-02-21 13:00:57" |
| "0010000000006183" | "2020-02-21 16:00:41" |
| "W2002181111547439" | "2020-02-21 04:00:34" |
| "11" | "2020-02-21 16:00:41" |
| "10112787861P1458" | "2020-02-21 10:00:54" |
+---------------------------------------------+
Wondering if there's a better approach?
You need to avoid making a cartesian product between the results of your two MATCH clauses. Let's say the two MATCH clauses would normally return N and M nodes, respectively, when executed in their own queries. Because your query combines those two MATCH clauses in the way that it does, your second MATCH clause is actually performing N*M matches (and producing N*M result rows).
You need to make sure you have created an index on :VW_OrderXStatusSummary4(SourceTypeID). That will optimize the lookups performed by the MATCH clauses.
You can simplify your Cypher code to avoid duplicated function calls.
After creating the index indicated above, try this:
WITH apoc.date.parse("2020-02-10",'s',('yyyy-MM-dd')) AS a, apoc.date.parse("2020-02-16",'s',('yyyy-MM-dd')) AS b
MATCH (x:VW_OrderXStatusSummary4 {SourceTypeID: "1"})
WHERE a <= apoc.date.parse(x.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH a, b, COLLECT(x.IdentifierValue) AS X
MATCH (y:VW_OrderXStatusSummary4 {SourceTypeID: "2"})
WHERE a <= apoc.date.parse(y.TimeStamp,'s',('yyyy-MM-dd HH:mm:ss')) <= b
WITH X, COLLECT(y.IdentifierValue) AS Y
UNWIND apoc.coll.disjunction(X, Y) AS YY
...
Performing the COLLECT(x.IdentifierValue) operation in the first WITH clause causes it to return all the x nodes in a single result row (instead of N result rows). This allows the second MATCH to avoid a cartesian product issue.

How to use average function in neo4j with collection

I want to calculate covariance of two vectors as collection
A=[1, 2, 3, 4]
B=[5, 6, 7, 8]
Cov(A,B)= Sigma[(ai-AVGa)*(bi-AVGb)] / (n-1)
My problem for covariance computation is:
1) I can not have a nested aggregate function
when I write
SUM((ai-avg(a)) * (bi-avg(b)))
2) Or in another shape, how can I extract two collection with one reduce such as:
REDUCE(x= 0.0, ai IN COLLECT(a) | bi IN COLLECT(b) | x + (ai-avg(a))*(bi-avg(b)))
3) if it is not possible to extract two collection in oe reduce how it is possible to relate their value to calculate covariance when they are separated
REDUCE(x= 0.0, ai IN COLLECT(a) | x + (ai-avg(a)))
REDUCE(y= 0.0, bi IN COLLECT(b) | y + (bi-avg(b)))
I mean that can I write nested reduce?
4) Is there any ways with "unwind", "extract"
Thank you in advanced for any help.
cybersam's answer is totally fine but if you want to avoid the n^2 Cartesian product that results from the double UNWIND you can do this instead:
WITH [1,2,3,4] AS a, [5,6,7,8] AS b
WITH REDUCE(s = 0.0, x IN a | s + x) / SIZE(a) AS e_a,
REDUCE(s = 0.0, x IN b | s + x) / SIZE(b) AS e_b,
SIZE(a) AS n, a, b
RETURN REDUCE(s = 0.0, i IN RANGE(0, n - 1) | s + ((a[i] - e_a) * (b[i] - e_b))) / (n - 1) AS cov;
Edit:
Not calling anyone out, but let me elaborate more on why you would want to avoid the double UNWIND in https://stackoverflow.com/a/34423783/2848578. Like I said below, UNWINDing k length-n collections in Cypher results in n^k rows. So let's take two length-3 collections over which you want to calculate the covariance.
> WITH [1,2,3] AS a, [4,5,6] AS b
UNWIND a AS aa
UNWIND b AS bb
RETURN aa, bb;
| aa | bb
---+----+----
1 | 1 | 4
2 | 1 | 5
3 | 1 | 6
4 | 2 | 4
5 | 2 | 5
6 | 2 | 6
7 | 3 | 4
8 | 3 | 5
9 | 3 | 6
Now we have n^k = 3^2 = 9 rows. At this point, taking the average of these identifiers means we're taking the average of 9 values.
> WITH [1,2,3] AS a, [4,5,6] AS b
UNWIND a AS aa
UNWIND b AS bb
RETURN AVG(aa), AVG(bb);
| AVG(aa) | AVG(bb)
---+---------+---------
1 | 2.0 | 5.0
Also as I said below, this doesn't affect the answer because the average of a repeating vector of numbers will always be the same. For example, the average of {1,2,3} is equal to the average of {1,2,3,1,2,3}. It is likely inconsequential for small values of n, but when you start getting larger values of n you'll start seeing a performance decrease.
Let's say you have two length-1000 vectors. Calculating the average of each with a double UNWIND:
> WITH RANGE(0, 1000) AS a, RANGE(1000, 2000) AS b
UNWIND a AS aa
UNWIND b AS bb
RETURN AVG(aa), AVG(bb);
| AVG(aa) | AVG(bb)
---+---------+---------
1 | 500.0 | 1500.0
714 ms
Is significantly slower than using REDUCE:
> WITH RANGE(0, 1000) AS a, RANGE(1000, 2000) AS b
RETURN REDUCE(s = 0.0, x IN a | s + x) / SIZE(a) AS e_a,
REDUCE(s = 0.0, x IN b | s + x) / SIZE(b) AS e_b;
| e_a | e_b
---+-------+--------
1 | 500.0 | 1500.0
4 ms
To bring it all together, I'll compare the two queries in full on length-1000 vectors:
> WITH RANGE(0, 1000) AS aa, RANGE(1000, 2000) AS bb
UNWIND aa AS a
UNWIND bb AS b
WITH aa, bb, SIZE(aa) AS n, AVG(a) AS avgA, AVG(b) AS avgB
RETURN REDUCE(s = 0, i IN RANGE(0,n-1)| s +((aa[i]-avgA)*(bb[i]-avgB)))/(n-1) AS
covariance;
| covariance
---+------------
1 | 83583.5
9105 ms
> WITH RANGE(0, 1000) AS a, RANGE(1000, 2000) AS b
WITH REDUCE(s = 0.0, x IN a | s + x) / SIZE(a) AS e_a,
REDUCE(s = 0.0, x IN b | s + x) / SIZE(b) AS e_b,
SIZE(a) AS n, a, b
RETURN REDUCE(s = 0.0, i IN RANGE(0, n - 1) | s + ((a[i] - e_a) * (b[i
] - e_b))) / (n - 1) AS cov;
| cov
---+---------
1 | 83583.5
33 ms
[EDITED]
This should calculate the covariance (according to your formula), given your sample inputs:
WITH [1,2,3,4] AS aa, [5,6,7,8] AS bb
UNWIND aa AS a
UNWIND bb AS b
WITH aa, bb, SIZE(aa) AS n, AVG(a) AS avgA, AVG(b) AS avgB
RETURN REDUCE(s = 0, i IN RANGE(0,n-1)| s +((aa[i]-avgA)*(bb[i]-avgB)))/(n-1) AS covariance;
This approach is OK when n is small, as is the case with the original sample data.
However, as #NicoleWhite and #jjaderberg point out, when n is not small, this approach will be inefficient. The answer by #NicoleWhite is an elegant general solution.
How do you arrive at collections A and B? The avg function is an aggregating function and cannot be used in the REDUCE context, nor can it be applied to collections. You should calculate your average before you get to that point, but exactly how to do that best depends on how you arrive at the two collections of values. If you are at a point where you have individual result items that you then collect to get A and B, that's the point when you could use avg. For example:
WITH [1, 2, 3, 4] AS aa UNWIND aa AS a
WITH collect(a) AS aa, avg(a) AS aAvg
RETURN aa, aAvg
and for both collections
WITH [1, 2, 3, 4] AS aColl UNWIND aColl AS a
WITH collect(a) AS aColl, avg(a) AS aAvg
WITH aColl, aAvg,[5, 6, 7, 8] AS bColl UNWIND bColl AS b
WITH aColl, aAvg, collect(b) AS bColl, avg(b) AS bAvg
RETURN aColl, aAvg, bColl, bAvg
Once you have the two averages, let's call them aAvg and bAvg, and the two collections, aColl and bColl, you can do
RETURN REDUCE(x = 0.0, i IN range(0, size(aColl) - 1) | x + ((aColl[i] - aAvg) * (bColl[i] - bAvg))) / (size(aColl) - 1) AS covariance
Thank you so much Dears, however I wonder which one is most efficient
1) Nested unwind and range inside reduce -> #cybersam
2) nested Reduce -> #Nicole White
3) Nested With (reset query by with) -> #jjaderberg
BUT Important Issue is :
Why there is an error and difference between your computations and real and actual computations.
I mean your covariance equals to = 1.6666666666666667
But in real world covariance equals to = 1.25
please check: https://www.easycalculation.com/statistics/covariance.php
Vector X: [1, 2, 3, 4]
Vector Y: [5, 6, 7, 8]
I think this differences is because that some computation do not consider (n-1) as divisor and instead of (n-1) , just they use n. Therefore when we grow divisor from n-1 to n the result will be diminished from 1.6 to 1.25.

number of connected nodes to specific nodes in a path

I have a cypher query (below).
It works but I was wondering if there's a more elegant way to write this.
Based on a given starting node, the query tries to:
Find the following pattern/motif: (inputko)-->(:cpd)-->(ko2:ko)-->(:cpd)-->(ko3:ko).
Foreach the motifs/patterns found, find connected nodes with labels contigs, for the following nodes in the pattern: [inputko, ko2, ko3].
A summary of the 3 nodes and their connected contigs, ie. the name property .ko of the 3 nodes and the number of connected :contig nodes in each of the (inputko)-->(:cpd)-->(ko2:ko)-->(:cpd)-->(ko3:ko) motifs that were found.
+--------------------------------------------------------------------------+
| KO1 | KO1count | KO2 | KO2count | KO3 | KO3count |
+--------------------------------------------------------------------------+
| "ko:K00001" | 102 | "ko:K14029" | 512 | "ko:K03736" | 15 |
| "ko:K00001" | 102 | "ko:K00128" | 792 | "ko:K12972" | 7 |
| "ko:K00001" | 102 | "ko:K00128" | 396 | "ko:K01624" | 265 |
| "ko:K00001" | 102 | "ko:K03735" | 448 | "ko:K00138" | 33 |
| "ko:K00001" | 102 | "ko:K14029" | 512 | "ko:K15228" | 24 |
+--------------------------------------------------------------------------+
I'm puzzled for the syntax to operate on each match.
From the documentation the foreach clause doesn't seem to be what I need.
Any ideas guys?
The FOREACH clause is used to update data within a collection, whether
components of a path, or result of aggregation.
Collections and paths are key concepts in Cypher. To use them for
updating data, you can use the FOREACH construct. It allows you to do
updating commands on elements in a collection — a path, or a
collection created by aggregation.
START
inputko=node:koid('ko:\"ko:K00001\"')
MATCH
(inputko)--(c1:contigs)
WITH
count(c1) as KO1count, inputko
MATCH
(inputko)-->(:cpd)-->(ko2:ko)-->(:cpd)-->(ko3:ko)
WITH
inputko.ko as KO1,
KO1count,
ko2,
ko3
MATCH
(ko2)--(c2:contigs)
WITH
KO1,
KO1count,
ko2.ko as KO2,
count(c2) as KO2count,
ko3
MATCH
(ko3)--(c3:contigs)
RETURN
KO1,
KO1count,
KO2,
KO2count,
ko3.ko AS KO3,
count(c3) AS KO3count
LIMIT
5;
realised that i have to place distinct for in count(distinct cX) to get a accurate count. Do not know why.
I am not sure how elegant this is but I think it does give you some notion about how you could extend your query for n ko nodes in a path and still return the data as you have laid it out below. It should also demonstrate the power of combining the with directive and collections.
// match the ko/cpd node paths starting with K00001
match p=(ko1:ko {name:'K00001' } )-->(:cpd)-->(ko2:ko)-->(:cpd)-->(ko3:ko)
// remove the cpd nodes from each path and name the collection row
with collect([n in nodes(p) where labels(n)[0] = 'ko' | n]) as row
// create a range for the number of rows and number of ko nodes per row
with row
, range(0, length(row)-1, 1) as idx
, range(0, 2, 1) as idx2
// iterate over each row and node in the order it was collected
unwind idx as i
unwind idx2 as j
with i, j, row[i][j] as ko_n
// find all of the contigs nodes atttached to each ko node
match ko_n--(:contigs)
// group the ko node data together in a collection preserving the order and the count
with i, [j, ko_n.name, count(*)] as ko_set
order by i, ko_set[0]
// re-collect the ko node sets as ko rows
with i, collect(ko_set) as ko_row
order by i
//return the original paths in the ko node order with the counts
return reduce( ko_str = "", ko in ko_row |
case
when ko_str = "" then ko_str + ko[1] + ", " + ko[2]
else ko_str + ", " + ko[1] + ", " + ko[2]
end) as `KO-Contigs Counts`
The foreach directive in cypher is strictly for mutating data. For instance , you could use one query to collect the contigs counts per ko node.
This is a bit convoluted and you would never update the number of contigs on a ko node like this but it illustrates the use of foreach in cypher.
match (ko:ko)-->(:contigs)
with ko,count(*) as ct
with collect(ko) as ko_nodes, collect(ct) as ko_counts
with ko_nodes, ko_counts, range(0,length(ko_nodes)-1, 1) as idx
foreach ( i in idx |
set (ko_nodes[i]).num_contigs = ko_counts[i] )
A simpler way to perform the above update task on each ko node would be to do something like this...
match (ko:ko)-->(:contigs)
with ko, count(*) as ct
set ko.num_contigs = ct
If you were to carry teh number of contigs on each ko node then you could perform a query like this to return the number of
// match all the paths starting with K00001
match p=(ko1:ko {name:'K00001' } )-->(:cpd)-->(ko2:ko)-->(:cpd)-->(ko3:ko)
// build a csv line per path
return reduce( ko_str = "", ko in nodes(p) | ko_str +
// using just the ko nodes in the path
// exclude the cpd nodes
case
when labels(ko)[0] = "ko" then ko.name + ", " + toString(ko.num_contigs) + ", "
else ""
end
) as `KO-Contigs Counts`

Neo4j browser interface stops working or reconnecting

The problem is the same no matter if I am using Safari or Chrome.
After running several times the same query shown below, I am getting the error: Disconnected from Neo4j. Please check if the cord is unplugged.
I am able to SSH to the server and run the query from the shell.
This query was the subject of another issue open earlier and the someone optimize it to the form is presented below. So is not a mater of a not optimized query, seems to be something about the browser interface.
What is wrong here?
MATCH (p:Publisher)-[r:PUBLISHED]->(w:Woka)<-[s:AUTHORED]-(a:Author)
MATCH (l:Language)-[t:USED]->(w)-[u:INCLUDED]->(b:Bisac)
WHERE (a.author_name = 'Camus, Albert')
WITH p,r,w,s,a,l,t,u,b
OPTIONAL MATCH (d:Description)-[v:HAS_DESCRIPTION]-(w)
RETURN w, p, a, l, b, d, r, s, t, u, v;
More details: when the browser dies in one computer, dies also in the second computer trying to connect to same database.
Also other commands i.e.
$ rails console
or
$ rails s -d
to start the rails server no longer works.
If I am restarting the Neo4j db server all are working for a little bit and frozen after that.
Below is the execution plan of the query:
neo4j-sh (?)$ EXPLAIN MATCH (p:Publisher)-[r:PUBLISHED]->(w:Woka)<-[s:AUTHORED]-(a:Author{author_name: 'Camus, Albert'}), (l:Language)-[t:USED]->(w)-[u:INCLUDED]->(b:Bisac)
WITH p,r,w,s,a,l,t,u,b
OPTIONAL MATCH (d:Description)-[v:HAS_DESCRIPTION]-(w)
RETURN w, p, a, l, b, d, r, s, t, u, v;
+--------------------------------------------+
| No data returned, and nothing was changed. |
+--------------------------------------------+
73 ms
Compiler CYPHER 2.2
Planner COST
OptionalExpand(All)
|
+Filter(0)
|
+Expand(All)(0)
|
+Filter(1)
|
+Expand(All)(1)
|
+Filter(2)
|
+Expand(All)(2)
|
+Filter(3)
|
+Expand(All)(3)
|
+NodeUniqueIndexSeek
+---------------------+---------------+---------------------------------+-----------------------------+
| Operator | EstimatedRows | Identifiers | Other |
+---------------------+---------------+---------------------------------+-----------------------------+
| OptionalExpand(All) | 5 | a, b, d, l, p, r, s, t, u, v, w | (w)-[v:HAS_DESCRIPTION]-(d) |
| Filter(0) | 5 | a, b, l, p, r, s, t, u, w | b:Bisac |
| Expand(All)(0) | 5 | a, b, l, p, r, s, t, u, w | (w)-[u:INCLUDED]->(b) |
| Filter(1) | 4 | a, l, p, r, s, t, w | l:Language |
| Expand(All)(1) | 4 | a, l, p, r, s, t, w | (w)<-[t:USED]-(l) |
| Filter(2) | 4 | a, p, r, s, w | p:Publisher |
| Expand(All)(2) | 4 | a, p, r, s, w | (w)<-[r:PUBLISHED]-(p) |
| Filter(3) | 4 | a, s, w | w:Woka |
| Expand(All)(3) | 4 | a, s, w | (a)-[s:AUTHORED]->(w) |
| NodeUniqueIndexSeek | 1 | a | :Author(author_name) |
+---------------------+---------------+---------------------------------+-----------------------------+
Total database accesses: ?
neo4j-sh (?)$
Here is a snapshot from top (before having the browser frozen):
top - 14:59:36 up 46 days, 17:03, 2 users, load average: 2.66, 4.58, 3.75
Tasks: 116 total, 2 running, 114 sleeping, 0 stopped, 0 zombie
%Cpu(s): 97.5 us, 0.8 sy, 0.0 ni, 1.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.2 st
KiB Mem: 15666128 total, 3858028 used, 11808100 free, 169612 buffers
KiB Swap: 0 total, 0 used, 0 free. 2144784 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
10260 neo4j 20 0 14.348g 1.388g 195316 S 196.9 9.3 1:57.55 java
9879 ubuntu 20 0 23680 1656 1116 R 0.3 0.0 0:00.88 top
1 root 20 0 33508 2236 860 S 0.0 0.0 0:12.25 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.55 ksoftirqd/0
4 root 20 0 0 0 0 S 0.0 0.0 0:30.10 kworker/0:0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
7 root 20 0 0 0 0 S 0.0 0.0 0:39.08 rcu_sched
8 root 20 0 0 0 0 R 0.0 0.0 0:47.50 rcuos/0
9 root 20 0 0 0 0 S 0.0 0.0 1:00.72 rcuos/1
What is the spec of your computer?
How much data does your query return?
MATCH (p:Publisher)-[r:PUBLISHED]->(w:Woka)<-[s:AUTHORED]-(a:Author)
WHERE (a.author_name = 'Camus, Albert')
MATCH (l:Language)-[t:USED]->(w)-[u:INCLUDED]->(b:Bisac)
WITH p,r,w,s,a,l,t,u,b
OPTIONAL MATCH (d:Description)-[v:HAS_DESCRIPTION]-(w)
RETURN w, p, a, l, b, d, r, s, t, u, v;
Also what does your visual query plan look like? Please prefix your query with PROFILE save as png and share it.
The browser interface of this product Neo4j needs a major overhaul. There is no way to use this interface for serious design, modelling and development.
I executed the following stress tests a from Ruby on Rails console. No errors about disconnect, network etc. All run successfully while any of these queries frozen the browser after 5, 6, 7 executions and even if the result set is limited to 25 records. More than that, I executed all of them while the browser interface was still frozen showing that network disconnect error.
(1..1000).each do |n|
q = "MATCH (p:Publisher)-[r:PUBLISHED]->(w:Woka)<-[s:AUTHORED]-(a:Author)
WHERE (a.author_name = 'Freud, Sigmund')
MATCH (l:Language)-[t:USED]->(w)-[u:INCLUDED]->(b:Bisac)
WITH p,r,w,s,a,l,t,u,b
OPTIONAL MATCH (d:Description)-[v:HAS_DESCRIPTION]-(w)
RETURN w, p, a, l, b, d, r, s, t, u, v;"
r = Neo4j::Session.current.query(q)
print n, "\t", r.count, "\t", Time.now, "\n"
end
(1..1000).each do |n|
q = "MATCH (p:Publisher)-[r:PUBLISHED]->(w:Woka)<-[s:AUTHORED]-(a:Author)
WHERE (a.author_name = 'Einstein, Albert')
MATCH (l:Language)-[t:USED]->(w)-[u:INCLUDED]->(b:Bisac)
WITH p,r,w,s,a,l,t,u,b
OPTIONAL MATCH (d:Description)-[v:HAS_DESCRIPTION]-(w)
RETURN w, p, a, l, b, d, r, s, t, u, v;"
r = Neo4j::Session.current.query(q)
print n, "\t", r.count, "\t", Time.now, "\n"
end
(1..1000).each do |n|
q = "MATCH (p:Publisher)-[r:PUBLISHED]->(w:Woka)<-[s:AUTHORED]-(a:Author)
WHERE (a.author_name = 'Freud, Sigmund')
MATCH (l:Language)-[t:USED]->(w)-[u:INCLUDED]->(b:Bisac)
WITH p,r,w,s,a,l,t,u,b
OPTIONAL MATCH (d:Description)-[v:HAS_DESCRIPTION]-(w)
RETURN w, p, a, l, b, d, r, s, t, u, v;"
r = Neo4j::Session.current.query(q)
print n, "\t", r.count, "\t", Time.now, "\n"
end

Resources