Natural sorting in neo4j - neo4j

We have a bunch of nodes with properties that are converted from BigDecimals to string during insert and vice versa during load.
This leads to typical problems during sorting. Values 1, 2, 3, 10 get sorted as 1, 10, 2, 3.
Does cypher has any means of doring natural sorting on strings? Or do we have to convert these properties to doubles or something like that?

Guess the best way is to store them as integers in your db. Also, in the current milestone release, there's a toInt() function which you could use to sort.
START n=node(*)
WITH toInt(n.stringValue) as nbr
RETURN n
ORDER BY nbr

Can you add a primary sort on string length?
CREATE ({val:"3"}),({val:"6"}),({val:"9"}),({val:"12"}),({val:"15"}),({val:"18"}),({val:"21"})
MATCH (n) RETURN n.val ORDER BY n.val
// 12, 15, 18, 21, 3, 6, 9
MATCH (n) RETURN length(n.val), n.val
// 3, 6, 9, 12, 15, 21
http://console.neo4j.org/r/kb0obm
If you keep converting them back and forth it sounds like it would be better to store them as their proper types in the database.

Related

Neo4j - Get certain nodes and relations

I have an application where nodes and relations are shown. After a result is shown, nodes and relations can be added through the gui. When the user is done, I would like to get all the data from the database again (because I don't have all data by this point in the front-end) based on the Neo4j id's of all nodes and links. The difficult part for me is that there are "floating" nodes that don't have a relation in the result of the gui (they will have relations in the database, but I don't want these). Worth mentioning is that on my relations, I have the start and end node id. I was thinking to start from there, but then I don't have these floating nodes.
Let's take a look at this poorly drawn example image:
As you can see:
node 1 is linked (no direction) to node 2.
node 2 is linked to node 3 (from 2 to 3)
node 3 is linked to node 4 (from 3 to 4)
node 3 is also linked to node 5 (no direction)
node 6 is a floating node, without relations
Let's assume that:
id(relation between 1 and 2) = 11
id(relation between 2 and 3) = 12
id(relation between 3 and 4) = 13
id(relation between 3 and 5) = 14
Keeping in mind that behind the real data, there are way more relations between all these nodes, how can I recreate this very image again via Neo4j? I have tried doing something like:
match path=(n)-[rels*]-(m)
where id(n) in [1, 2, 3, 4, 5]
and all(rel in rels where id in [11, 12, 13, 14])
and id(m) in [1, 2, 3, 4, 5]
return path
However, this doesn't work properly because of multiple reasons. Also, just matching on all the nodes doesn't get me the relations. Do I need to union multiple queries? Can this be done in 1 query? Do I need to write my own plugin?
I'm using Neo4j 3.3.5.
You don't need to keep a list of node IDs. Every relationship points to its 2 end nodes. Since you always want both end nodes, you get them for free using just the relationship ID list.
This query will return every single-relationship path from a relationship ID list. If you are using the neo4j Browser, its visualization should knit together these short paths and display your original full paths.
MATCH p=()-[r]-()
WHERE ID(r) IN [11, 12, 13, 14]
RETURN p
By the way, all neo4j relationships have a direction. You may choose not to specify the direction when you create one (using MERGE) and/or query for one, but it still has a direction. And the neo4j Browser visualization will always show the direction.
[UPDATED]
If you also want to include "floating" nodes that are not attached to a relationship in your relationship list, then you could just use a separate floating node ID list. For example:
MATCH p=()-[r]-()
WHERE ID(r) IN [11, 12, 13, 14]
RETURN p
UNION
MATCH p=(n)
WHERE ID(n) IN [6]
RETURN p

Does Flatten have any effects other than flattening collections element-wise?

Specifically, does the Flatten PTransform in Beam perform any sort of:
Deduplication
Filtering
Purging of existing elements
Or does it just "merge" two different PCollections?
The Flatten transform does not do any sort of deduplication, or filtering of any kind. As mentioned, it simply merges the multiple PCollections into one that contains the elements of each of the inputs.
This means that:
with beam.Pipeline() as p:
c1 = p | "Branch1" >> beam.Create([1, 2, 3, 4])
c2 = p | "Branch2" >> beam.Create([4, 4, 5, 6])
result = (c1, c2) | beam.Flatten()
In this case, the result PCollection contains the following elements: [1, 2, 3, 4, 4, 4, 5, 6].
Note how the element 4 appears once in c1, and twice in c2. This is not deduplicated, filtered or removed in any way.
As a curious fact about Flatten, some runners optimize it away, and simply add the downstream transform in both branches. So, in short, no special filtering or dedups. Simply merging of PCollections.

Finding largest path in huge cyclic neo4j directed graph with infinite depth using cypher query

I have a neo4j graph in which different nodes are connected through directed relationship.This graph contains cycles. I want to find all entities in largest path with this relationship for a set of given entities to a set of target entities. The query I am using is provided below :
NOTE: Number of nodes in sample graph = 1000 and Relationships = 2500 and Depth = Infinite. Also our final graph may contain nodes upto 25000.
match (n:dataEntity) where id(n) in
[28, 4, 27, 151, 34, 36, 57, 59, 71, 73, 75, 119, 121, 140, 142, 144]
match (d:dataEntity) where NOT (d)-[:dependsOn]->(:dataEntity)
with distinct d ,n
match res =(n)-[:dependsOn*]->(d)
with d,n,nodes(res) as x
return x
The problem with this query is that it works fine upto depth 5 but as we are going for uncertain depth it is taking too much time i.e. more than 20 minutes.
Thanks in advance and plz revert if u need any further information !!!
The basic problem is that you are trying to perform an unreasonably expensive query.
Based on your data characteristics:
a dataEntity node has an average of about 2.5 outgoing relationships
the path from any node n to any node d can have a length of up to 2500
Let's say, for example, that a particular path (out of probably a very large number of possible paths) from one specific n to one specific d is of length 500. To find that single path, the number of operations would be (2.5^500), or about 10^199.
You need to reconsider what you are trying to do, and see if there is a more clever way to do what you want. Perhaps changing the data model will help, but it all depends on your use cases.

Query to return nodes that have no specific relationship within an already matched set of nodes

The following statement creates the data I am trying to work with:
CREATE (p:P2 {id: '1', name: 'Arthur'})<-[:EXPANDS {recorded: 1, date:1}]-(:P2Data {wage: 1000})
CREATE (d2:P2Data {wage: 1100})-[:EXPANDS {recorded: 2, date:4}]->(p)
CREATE (d3:P2Data {wage: 1150})-[:EXPANDS {recorded: 3, date:3}]->(p)
CREATE (d3)-[:CANCELS]->(d2)
So, Arthur is created and initially has a wage of 1000. On day 2 we add the info that the Wage will be 1100 from day 4 onwards. On day 3 we state that the wage will be increased to 1150, which cancels the entry from day 2.
Now, if I look at the history as it was valid for a given point in time, when the point in time is 2, the following history is correct:
day 1 - wage 1000
day 4 - wage 1100
when the point in time is 3, the following history is correct:
day 1 - wage 1000
day 3 - wage 1150
expressed in graph terms, when I match the P2Data based on the :EXPANDS relationship, I need those that are not cancelled by any other P2Data node that has also been matched.
This is my attempt so far:
MATCH p=(:P2 {id: '1'})<-[x1:EXPANDS]-(d1:P2Data)
WHERE x1.recorded <= 3
WITH x1.date as date,
FILTER(n in nodes(p)
WHERE n:P2Data AND
SIZE(FILTER(n2 IN nodes(p) WHERE (n2:P2Data)-[:CANCELS]->(n))) = 0) AS result
RETURN date, result
The idea was to only get those n in nodes(p) where there are no paths pointing to it via the :CANCELS relationship.
Since I am still new to this and somehow cypher hasn't clicked yet for me, feel free to discard that query completely.
If you modify your data model by removing the CANCELS relationship, and instead add an optional canceled date to the EXPANDS relationship type, you can greatly simplify the required query.
For example, create the test data:
CREATE (p:P2 {id: '1', name: 'Arthur'})<-[:EXPANDS {recorded: 1, date:1}]-(:P2Data {wage: 1000})
CREATE (d2:P2Data {wage: 1100})-[:EXPANDS {recorded: 2, date:4, canceled: 3}]->(p)
CREATE (d3:P2Data {wage: 1150})-[:EXPANDS {recorded: 3, date:3}]->(p)
Perform simple query:
MATCH p=(:P2 {id: '1'})<-[x1:EXPANDS]-(d1:P2Data)
WHERE x1.recorded <= 3 AND (x1.canceled IS NULL OR x1.canceled > 3)
RETURN x1.date AS date, d1
ORDER BY date;
MATCH (:P2 {id: '1'})<-[x1:EXPANDS]-(d1:P2Data)
WHERE x1.recorded <= 3
WITH x1.date AS valid_date, x1.recorded AS transaction_date, d1.wage AS wage
ORDER BY valid_date
WITH COLLECT({v: valid_date, t: transaction_date, w:wage}) AS dates
WITH REDUCE(x = [HEAD(dates)], date IN TAIL(dates)|
CASE
WHEN date.v = LAST(x).v AND date.t > LAST(x).t THEN x[..-1] + [date]
WHEN date.t > LAST(x).t THEN x + [date]
ELSE x
END) AS results
UNWIND results AS result
RETURN result.v, result.w
I'm trying to think of a way to model this better, but I'm honestly pretty stumped.

CREATE combined with FOREACH in CYPHER gives unexpected results

I have a graph in which versioning information is stored as [:ADD] or [:REMOVE] relations between nodes. I want to replace those rels by another model, based on [:UPDATE] rels with property type and timestamp.
Currently
MATCH (n:tocversion)-[r:ADD]->(m)
RETURN n.version,id(m)
returns this (as expected)
n.version,id(m)
1,13
1,14
2,15
2,16
3,17
3,18
3,19
3,20
4,21
4,22
Now I thought I could collect the versions and m's and use them as a basis to create rels in the new model. Like this.
MATCH (n:tocversion)-[r:ADD]->(m),(t:toc)
WITH t,COLLECT(n.version) AS versions, COLLECT(m) AS ms
FOREACH(i IN versions |
FOREACH(m1 IN [ms[i]]|
CREATE (t)-[r1:UPDATE {type:"ADD", version:versions[i]}]->(m1)))
However, the rels are created in a way I don't understand, because
MATCH (t:toc)-[r:`UPDATE`]->(b) RETURN r.version,r.type,id(b)
returns
r.version,r.type,id(b)
1, ADD, 14
1, ADD, 14
2, ADD, 15
2, ADD, 15
2, ADD, 16
2, ADD, 16
2, ADD, 16
2, ADD, 16
3, ADD, 17
3, ADD, 17
instead of the expected
r.version,r.type,id(b)
1, ADD, 13
1, ADD, 14
2, ADD, 15
2, ADD, 16
3, ADD, 17
3, ADD, 18
3, ADD, 19
3, ADD, 20
4, ADD, 21
4, ADD, 22
Found it. Had to use RANGE
match (n:tocversion)-[r:ADD]->(m),(t:toc)
with t,collect(n.version) as versions, collect(m) as ms
foreach(i in RANGE(0, LENGTH(versions)-1) |
foreach(m1 in [ms[i]]|
create (t)-[r1:UPDATE5 {type:"ADD", version:versions[i]}]->(m1)))
Likely because of this:
FOREACH(i IN versions |
FOREACH(m1 IN [ms[i]] |
Your "i" is going to be: 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, as expected.
But if you're using those as indices into the ms[] collection (which is 0-based), you're going to be looking at ms[] = {13, 14, 15, 16, 17, .. , 22}, and so ms[1] will always be 14, ms[2] will always be 15, ms[3] will always be 16, and ms[4] will always be 17.
Your "foreach" loops need to be rethought as "i" shouldn't be used as a lookup into "ms".
In fact I'm also not certain "i" should be used as an index into "versions" which you do in your CREATE statement as you'll likely have a similar issue as above (e.g. versions[3] will always be 2).

Resources