How to write cypher query to count number of nodes in graph based on levenshtein similarity - neo4j

Hello everyone I need to write a cypher query for a below scenario.
Given a list of strings, count the number nodes in graph where levenshtein similarity between node name property and strings from the list is more than certain thershold.
I was able to write query if we only have 1 string but I am not sure how to write a query if we have multiple strings ['string 1', 'string 2', 'string 3'].
MATCH (n:Node)
UNWIND (n.name) as name_lst
RETURN SUM(toInteger(apoc.text.levenshteinSimilarity(name_lst, 'string 1') > 0.6))
Any thoughts on how to transform the above a query if we have multiple strings.

No need to UNWIND the name as name_lst and you can use that variable directly in the APOC function.
If any of the string in the list ['string 1', 'string 2', 'string 3'] has a levSim value of > 0.6 then it will return true. Converting true to integer is 1.
Thus, getting the sum of all 1s in the result will give you the number of Nodes that has a name property with levSim value > 0.6 to any string on the list ['string 1', 'string 2', 'string 3'].
MATCH (n:Node)
RETURN SUM(toInteger(ANY(s in ['string 1', 'string 2', 'string 3']
WHERE apoc.text.levenshteinSimilarity(n.name, s ) > 0.6)))

One option is to use reduce:
MATCH (n:Node)
WITH toInteger(reduce(maxValSoFar = 0,
s IN ['string 1', 'string 2', 'string 3'] |
apoc.coll.max([maxValSoFar, apoc.text.levenshteinSimilarity(n.name, s)])) >
0.6) AS nodes
RETURN SUM(nodes)
For sample data:
MERGE (a1:Node {name:'string 1'})
MERGE (a2:Node {name:'asdss'})
MERGE (a3:Node {name:'string 2'})
MERGE (a4:Node {name:'afffs'})
MERGE (a5:Node {name:'efwetreyy'})
MERGE (a6:Node {name:'ffuumxt'})
The result is:
╒════════════╕
│"sum(nodes)"│
╞════════════╡
│2 │
└────────────┘

Related

Division operator in Neo4j

When I execute the following cypher:
CREATE (n:Person {name: 'Andy',sal:600/3, title: 'Developer'})
The salary value will equals 300.
But when execute the following:
CREATE (n:Person {name: 'Andy',sal:1/2, title: 'Developer'})
The salary value will equals 0.
What should I do to retrieve the correct answer?
The default value appears to be integer, so an easy way to fix this is to use toFloat:
CREATE (n:Person {name: 'Andy', sal: toFloat(1)/2, title: 'Developer'})
WITH n
MATCH (n:Person)
RETURN n

Matching a path with values from an array in Neo4j

Given that I have the node type Component and the relationship HAS_CHILD_COMPONENT, with a repeting relationship like the following example:
(a:Component {value: 'a'})-[:HAS_CHILD_COMPONENT]->(b:Component {value: 'b'})-[:HAS_CHILD_COMPONENT]->(c:Component {value: 'c'})
I would like to query and return the (:Component {value: 'c'}) node in a way where I specify the full path.
It could be written litteraly like above but, in my use-case I'd also like to be able to query for a potential node (:Component {value: 'd'}) - three relationships away from (:Component {value: 'a'}).
Is there a way to write a query by supplying it an array with the matching values along the path?
If the query intended to find (:Component {value: 'c'}), the query would be supplied with the array parameter:
['a', 'b', 'c'].
To find the node (:Component) {value: 'd'}) the supplied array would be:
['a', 'b', 'c', 'd'].
Does something like this get you going? Supply an array of values to the query, use the beginning and end values in the supplied array to anchor the path and then use reduce to ensure the path contains the commponents supplied in the array.
WITH ['a','b','c','d'] AS components
MATCH path=(start:Component)-[:HAS_CHILD_COMPONENT*..4]->(end:Component)
WHERE start.value = components[0]
AND end.value = components[size(components)-1]
AND reduce(values = [], n in nodes(path) | values + [n.value])
RETURN path

How to display all nodes and relationships' names in sequence from a Neo4j path

I would like to know how to display the names of every node and relationship generated from a path p in Neo4j Cypher.
I have this query:
MATCH (m { name: 'porsche' }),(n { name: 'vehicle' }), p =(m)-[r*]->(n)
return collect(p);
══════════════════════════════╕
│"collect(p)" │
╞══════════════════════════════╡
│[[{"name":"porsche"},{"name":"│
│is a"},{"name":"car","type":"l│
│abel"},{"name":"car","type":"l│
│abel"},{"name":"is a subtype o│
│f"},{"name":"vehicle","type":"│
│label"}],[{"name":"porsche"},{│
│"name":"is a"},{"name":"car","│
│type":"label"},{"name":"car","│
│type":"label"},{},{"name":"veh│
│icle","type":"label"}]] │
└──────────────────────────────┘
But I want it do display each node's name and then each Relationship's name in sequence like this:
'porsche' 'is a' 'car'
'car' 'is a subtype of' vehicle
The output shown in your question indicates that your data is not well formed. For example, not all relationships actually have a name. This answer assumes well-formed data, but can be tweaked to handle missing name properties if needed.
Since your question was not clear about what you wanted, here are some options.
Option 1
This query will return a name triplet (in a list) for each relationship of each path:
MATCH ({ name: 'porsche' })-[rels*]->({ name: 'vehicle' })
WITH [r IN rels | [STARTNODE(r).name, r.name, ENDNODE(r).name]] AS steps
UNWIND steps AS step
RETURN step;
Sample output (for one path):
["porsche", "is a", "car"]
["car", "is a subtype of", "vehicle"]
Option 2
If you want to keep apart the results for each path, you can replace UNWIND path_names AS names RETURN names; with RETURN path_names;. That would produce somehting like this for each path:
[["porsche", "is a", "car"], ["car", "is a subtype of", "vehicle"]]
Option 3
If you want to get the distinct "steps" from all paths, you can do this:
MATCH ({ name: 'porsche' })-[rels*]->({ name: 'vehicle' })
WITH [r IN rels | [STARTNODE(r).name, r.name, ENDNODE(r).name]] AS steps
UNWIND steps AS step
RETURN DISTINCT step;
The result will look like the result for Option 1, except each "step" will be distinct.

Neo4j apoc dijkstra procedure

To find the shortest path in neo4j I am using Dijkstra's algorithm from APOC library. The issue is that request returns just 1 result. Is it possible to get 5 or 10 shortest paths? Or can I set conditions by weight of edges? For instance, total length more than 500.
MATCH (start:Point {title: 'Some Point 1'}), (end:Point {title: 'Some Point 5'}) CALL apoc.algo.dijkstra(start, end, 'distance', 'value') YIELD path, weight RETURN path, weight
If you want to have more control, I would go for a pure cypher solution instead of the apoc procedure.
Top 10:
MATCH p=(start:Point {title: 'Some Point 1'})-[rels:distance*]->(end:Point {title: 'Some Point 5'})
WITH p, REDUCE(weight=0, rel in rels | weight + rel.value) as length
RETURN p, length
ORDER BY length ASC
LIMIT 10
Path where length is more than 500:
MATCH p=(start:Point {title: 'Some Point 1'})-[rels:distance*]->(end:Point {title: 'Some Point 5'})
WITH p, REDUCE(weight=0, rel in rels | weight + rel.value) as length
with p, length where length > 500
return p, length
LIMIT 10

Neo4j is there a way to create a variable number of relationships in one cypher query?

I would like to create N nodes with a sequential relationship between each of them.
Think of my requirement as creating a workflow for a user. On the UI end it can send an array of json objects that must relate to each other sequentially. For example:
{steps: [ {name: 'step 1'}, {name: 'step2'}, {name: 'step3'}] }
What I want from the above json is to create 3 nodes and have them sequentially linked
(step 1)-[:has_next_step]->(step 2)-[:has_next_step]->(step 3)
Is there a quick way of doing this? Keep in mind my example has 3 nodes, but in reality I may have anywhere from 5-15 steps so the cypher query must be able to handle this variable input. Note that I have control over the input as well so if there is an easier json params variable I can use that as well.
You can, the only issue you will face is when iterating the collection of steps you will not be able to recognise the node representing the element before in the collection.
So a bit of hacking, you can use a timestamp in the beginning of the query to act as identifier :
WITH {steps: [ {name: 'step 1'}, {name: 'step2'}, {name: 'step3'}] } AS object
WITH object.steps AS steps, timestamp() AS identifier
UNWIND range(1, size(steps)-1) AS i
MERGE (s:Step {id: identifier + "_" + (i-1)}) SET s.name = (steps[i-1]).name
MERGE (s2:Step {id: identifier + "_" + (i)}) SET s2.name = (steps[i]).name
MERGE (s)-[:NEXT]->(s2)
Explanation :
I iterate the collection of steps with UNWIND, in order to recognize each node representing an already iterated step, I use a dummy identifier being the timestamp of the transaction + "_" + the sequence cursor.
At large scale you would better use your own identifiers (like a generated uuid on client side) and have an index/unique constraint on it.
More Advanced :
You have a User node and want to attach steps to it (context : the user didn't had any steps connected to it before)
Create a dummy user:
CREATE (u:User {login:"me"})
Create steps list and attach to user
WITH {steps: [ {name: 'step 1'}, {name: 'step2'}, {name: 'step3'}] } AS object
WITH object.steps AS steps, timestamp() AS identifier
UNWIND range(1, size(steps)-1) AS i
MERGE (s:Step {id: identifier + "_" + (i-1)}) SET s.name = (steps[i-1]).name
MERGE (s2:Step {id: identifier + "_" + (i)}) SET s2.name = (steps[i]).name
MERGE (s)-[:NEXT]->(s2)
WITH identifier + "_" + (size(steps)-1) AS lastStepId, identifier + "_0" AS firstStepId
MATCH (user:User {login:"me"})
OPTIONAL MATCH (user)-[r:LAST_STEP]->(oldStep)
DELETE r
WITH firstStepId, lastStepId, oldStep, user
MATCH (s:Step {id: firstStepId})
MATCH (s2:Step {id: lastStepId})
MERGE (user)-[:LAST_STEP]->(s)
WITH s2, collect(oldStep) AS old
FOREACH (x IN old | MERGE (s2)-[:NEXT]->(x))
Context, (run the same query but with different names for steps to visually see the diff) : The user has already steps attached to him :
WITH {steps: [ {name: 'second 1'}, {name: 'second 2'}, {name: 'second 3'}] } AS object
WITH object.steps AS steps, timestamp() AS identifier
UNWIND range(1, size(steps)-1) AS i
MERGE (s:Step {id: identifier + "_" + (i-1)}) SET s.name = (steps[i-1]).name
MERGE (s2:Step {id: identifier + "_" + (i)}) SET s2.name = (steps[i]).name
MERGE (s)-[:NEXT]->(s2)
WITH identifier + "_" + (size(steps)-1) AS lastStepId, identifier + "_0" AS firstStepId
MATCH (user:User {login:"me"})
OPTIONAL MATCH (user)-[r:LAST_STEP]->(oldStep)
DELETE r
WITH firstStepId, lastStepId, oldStep, user
MATCH (s:Step {id: firstStepId})
MATCH (s2:Step {id: lastStepId})
MERGE (user)-[:LAST_STEP]->(s)
WITH s2, collect(oldStep) AS old
FOREACH (x IN old | MERGE (s2)-[:NEXT]->(x))
You can use a couple of APOC procedures to create the nodes and then link them together:
apoc.create.nodes can be used to create multiple nodes with the same label(s).
apoc.nodes.link can be used to chain the nodes together with relationships of the same type.
For example, the query below will create your 3 sample nodes (with a Step label) and then link them together, in order, with has_next_step relationships:
CALL apoc.create.nodes(['Step'],[{name:'step1'},{name:'step2'},{name: 'step3'}]) YIELD node
WITH COLLECT(node) AS nodes
CALL apoc.nodes.link(nodes, 'has_next_step')
RETURN SIZE(nodes)
The apoc.nodes.link procedure does not return anything, so the above query just returns the number of nodes that were created and linked together.

Resources