How can I load json and use it inside apoc.periodic.iterate - neo4j

I am trying add some test data to all of my nodes of type A. I want to use a json file which contains that dummy data:
[
{
a: "x",
b: 1,
c: 3
},
{
a: "y",
b: 5,
c: 4
},
...
]
For every node of type A, all these objects should become one node of type B. They should also have a relationship.
So I tried it like this using apoc.periodic.iterate (because I have millions of nodes) and apoc.load.json:
CALL apoc.periodic.iterate(
'MATCH (n:A) RETURN n',
'CALL apoc.load.json("file:///values.json") YIELD value
WITH value
FOREACH (m in value |
CREATE (v:B { a: m.a, b: m.b, c: m.c })
MERGE (n)-[r:REL]->(v)',
{batchSize:500})
YIELD batches, total
RETURN batches, total
However this did nothing at all.
I also tried to load the json outside of iterate, so I dont have to read it for each node, but I could not find a way to use it inside then.
How could I get this to work?
Edit:
This is another query I tried:
CALL apoc.load.json("file:///values.json") YIELD value
WITH value AS value
CALL apoc.periodic.iterate('MATCH (n:A) RETURN n', 'FOREACH(m in $value | CREATE (v:B {a:m.a, b: m.b, c: m.c}) MERGE (n)-[r:REL]->(v))', {batchSize:500}) YIELD batches, total
RETURN batches, total
While running I tried checking if something happens using MATCH (n:A)-[]-(v:B) RETURN n, v LIMIT 100 but it seems that nothing is being updated.
This is what it should look like (red node = type A):

Related

Mass delete empty properties in a Neo4j database

I have a Neo4j database with 100M nodes. A lot of those nodes contain empty properties and I would like to remove these properties.
I have tried the following query:
:auto MATCH (n)
WITH n
call { with n
UNWIND keys(n) as k
WITH n, k
WHERE n[k] = ''
WITH n, collect(k) as propertyKeys
CALL apoc.create.removeProperties(n, propertyKeys)
YIELD node
RETURN node
} in transactions of 50000 rows;
I get the following error message:
Query cannot conclude with CALL (must be a RETURN clause, an update clause, a unit subquery call, or a procedure call with no YIELD) (line 3, column 1 (offset: 19))
"call { with n"
^
Can someone tell me what I'm doing wrong and how to fix that?
Thanks for your help !
I propose a counter solution to your query. Below is using apoc iterate function which will extract the data with empty property and execute the removal of the property by batch (50k) in parallel.
CALL apoc.periodic.iterate(
"MATCH (n) UNWIND keys(n) as k WITH n, k WHERE n[k] = '' RETURN n, k",
"WITH n, collect(k) as propertyKeys
CALL apoc.create.removeProperties(n, propertyKeys) YIELD node
RETURN node",
{batchSize:50000, parallel:true})
To explain the error you are getting, a subquery call cannot use another call that requires a yield function. It is mentioned in the error message at the end
qoute: Query cannot conclude with CALL with another procedure call with
YIELD

string aggregation in Cypher

I need an equivalent of Postgres string_agg, Oracle listagg or MySQL group_concat in Cypher. My Cypher query is a stored procedure returning stream of strings (in following minimized example replaced by collection unwind). As a result, I want to get single string with concatenation of all strings.
Example:
UNWIND ['first','second','third'] as c
RETURN collect(c)
Actual result (of list type):
["first", "second", "third"]
Expected result (of String type):
"firstsecondthird"
or (nice-to-have):
"firstCUSTOMSEPARATORsecondCUSTOMSEPARATORthird"
(I just wanted to quickly build ad hoc query to verify some postprocessing actually performed by Neo4j consumer and I thought it would be simple but I cannot find any solution. Unlike this answer I want string, not collection, since my issue is in something about length of concatenated string.)
How about using APOC ?
UNWIND ['first','second','third'] as c
RETURN apoc.text.join(collect(c), '-') AS concat
Where - is your custom separator.
Result :
╒════════════════════╕
│"concat" │
╞════════════════════╡
│"first-second-third"│
└────────────────────┘
--
NO APOC
Take into account when the collection is one element only
UNWIND ['first','second','third'] as c
WITH collect(c) AS elts
RETURN CASE WHEN size(elts) > 1
THEN elts[0] + reduce(x = '', z IN tail(elts) | x + '-' + z)
ELSE elts[0]
END
You can further simplify the query as below. And no need to worry if list is empty and it will return null if l is empty list
WITH ['first','second','third'] as l
RETURN REDUCE(s = HEAD(l), n IN TAIL(l) | s + n) AS result

Neo4j if a node has no outgoing edges delete that node otherwise return its next nodes

I'm trying to use Neo4j Cypher to implement the following function: given a node, check if it has any outgoing edges with a specific relationship type. If so, return the nodes it can reach out by those edges, otherwise delete this node. And my code is like this
MATCH (m:Node{Properties})
WITH (size((m)-[:type]->(:Node))) AS c,m
WHERE c=0
DETACH DELETE m
However I don't know how to apply the if/else condition here, and this code only implements part of what I need. I'd really appreciate your help and suggestions!
For example the database is like this:
A-[type]->B
A-[type]->C
If the original node is A and it has two edges with that type to B and C, then I want the query to return B and C as result.
If the original node is B, it should be deleted because there's no such outgoing edge from B.
[UPDATED]
The following query uses a FOREACH hack to conditionally delete m, and returns either the found n nodes, or NULL if there were none.
OPTIONAL MATCH (m:Node {...Properties...})-[:type]->(n:Node)
FOREACH(x IN CASE WHEN n IS NULL THEN [1] END | DETACH DELETE m)
RETURN n
You could also use the APOC procedure apoc.do.when instead of the FOREACH hack:
OPTIONAL MATCH (m:Node {...Properties...})-[:type]->(n:Node)
CALL apoc.do.when(n IS NULL, 'DETACH DELETE m', '', {m: m}) YIELD value
RETURN n

How to avoid cycle in neo4j cypher queries

I have friend-friend data model which has two relationships between any two friend nodes based on how one friend defines the other friend.
For example, User "A" can define user "B" as 'FRIEND' and "B" can define "A" as 'BUDDY'.
The problems is, when I try to get the 3rd degree of relationship of user "A", it returns user "B", where as the actual result should be "D" only.
MATCH(a:Users {first_name : "A"}) -[:BUDDY|FRIEND*3] -> (b)
RETURN a,b
OR
MATCH (a)-[]-(b)-[]-(c)-[]-(d)
WHERE a.first_name="A"
RETURN a,d
Alternatively, you can do this:
MATCH p=((a:Users {first_name : "A"})-[:BUDDY|FRIEND*3]->(b))
WITH DISTINCT a, b, nodes(p) as nodes
UNWIND nodes AS node
WITH a, b, nodes, COLLECT(DISTINCT node) as distinct_nodes
WITH a, b WHERE SIZE(nodes)=SIZE(distinct_nodes)
RETURN a, b
or a bit easier with an APOC call:
MATCH p=((a:Users {first_name : "A"})-[:BUDDY|FRIEND*3]->(b))
WITH DISTINCT a, b WHERE SIZE(nodes(p)) = SIZE(apoc.coll.toSet(nodes(p)))
RETURN a, b
I'd suggest the APOC Path Expander procedures which use a means of expansion that only ever consider a single path to a node, allow for specification of the max and min depth, take relationship filters, and set whether visiting a node more than once is permitted. Specifically, the apoc.path.expandConfig() procedure should meet your needs.
MATCH (a:Users {first_name: "A"})
CALL apoc.path.expandConfig(a, {relationshipFilter:"BUDDY|FRIEND",minLevel:3,maxLevel:3, bfs:true,uniqueness:"NODE_GLOBAL"}) YIELD path
RETURN a, path
The uniqueness:"NODE_GLOBAL" parameter makes sure no node is visited more than once.

Neo4j (Cypher) - Is it possible to use non-implicit aggregation?

My question is fairly straightforward. I've been trying to write a Cypher query which uses an aggregation function - min().
I am trying to obtain the closest node to a particular node using the new Spatial functions offered in Neo4j 3.4. My query currently looks like this:
MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" })
WITH distance(a.location, b.location) AS dist, a.stop_id as orig_stop_id, b.stop_id AS dest_stop_id
RETURN orig_stop_id,min(dist)
The location property is a point property and this query does actually do what I want it to do, except for one thing: I'd like to also include the dest_stop_id field in the result so that I can actually know which other node corresponds to this minimal distance, however Neo4j seems to aggregate implicitly all fields in the RETURN clause that are not inside an aggregate function and the result is I just get a list of all pairs (orig_stop_id, dest_stop_id) and their distance versus getting just the minimum and the corresponding dest_stop_id. Is there any way to specify which fields should be grouped in the result set?
In SQL, GROUP BY allows you to specify this but I haven't been able to find a similar function in Cypher.
Thanks in advance, please let me know if you need any extra information.
This should work:
MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" })
RETURN
a.stop_id AS orig_stop_id,
REDUCE(
s = NULL,
d IN COLLECT({dist: distance(a.location, b.location), sid: b.stop_id}) |
CASE WHEN s.dist < d.dist THEN s ELSE {dist: d.dist, dest_stop_id: d.sid} END
) AS min_data
This query uses REDUCE to get the minimum distance and also the corresponding dest_stop_id at the same time.
The tricky part is that the first time the CASE clause is executed, s will be NULL. Afterwards, s will be a map. The CASE clause handles the special NULL situation by specifically performing the s.dist < d.dist test, which will always evaluate to false if s is NULL -- causing the ELSE clause to be executed in that case, initializing s to be a map.
NOTE: Ideally, you should use the labels for your nodes in your query, so that the query does not have to scan every node in the DB to find each node. Also, you may want to add the appropriate indexes to further speed up the query.
Seems like you could skip the aggregation function and just order the distance and take the top:
MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" })
WITH distance(a.location, b.location) AS dist, a, b
ORDER BY dist DESC
LIMIT 1
RETURN a.stop_id as orig_stop_id, b.stop_id AS dest_stop_id, dist
As others here have mentioned you really should use labels here (otherwise is this doing all node scans to find your starting points, this is probably the main performance bottleneck of your query), and have indexes in place so you're using index lookups for both a and b.
EDIT
If you need the nearest when you have multiple starting nodes, you can take the head of the collected elements like so:
MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" })
WITH distance(a.location, b.location) AS dist, a, b
ORDER BY dist DESC
WITH a, head(collect(b {.stop_id, dist})) as b
RETURN a.stop_id as orig_stop_id, b.stop_id AS dest_stop_id, b.dist as dist
We do need to include dist into the map projection from b, otherwise it would be used as a grouping key along with a.
Alternately you could just collect b instead of the map projection and then recalculate with the distance() function per remaining row.
You can use COLLECT for aggregation (note this query isn't checked) :
MATCH (a { agency: "Bus", stop_id: "1234" }), (b { agency: "Train" })
WITH COLLECT (distance(a.location, b.location)) as distances, a.stop_id as stopId
UNWIND distances as distance
WITH min(distance) as min, stopId
MATCH (bus { agency: "Bus", stop_id: stopId}), (train{ agency: "Train" })
WHERE distance(bus.location, train.location) = distance
RETURN bus,train, distance
Hope this will help you.

Resources