Cypher query return undesirable result - neo4j

I need to get texts and save them to Neo4j. After that, I separate each word of that text and create a [:NEXT] relationship between them indicating the word that comes after another one and a [:CONTAINS] relationship indicating that the text contains that word.
Finally I try to get the word in the text that has more relations [:NEXT] but not in the whole database. Only in the given text.
Unfortunatelly I just get the sum of the whole database.
The query is:
query = '''
WITH split("%s"," ") as words
MERGE (p:Post {id: '%s', text: '%s'})
WITH p, words
UNWIND range(0,size(words)-2) as idx
MERGE (w1:Word {name:words[idx]})
MERGE (w2:Word {name:words[idx+1]})
MERGE (w1)-[:NEXT]->(w2)
MERGE (p)-[:CONTAINS]->(w2)
MERGE (p)-[:CONTAINS]->(w1)
WITH p
MATCH (p)-[c:CONTAINS]->(w:Word)
MATCH ()-[n1:NEXT]->(:Word {name: w.name})<-[:CONTAINS]-(p)
MATCH (p)-[:CONTAINS]-(:Word {name: w.name})-[n2:NEXT]->()
WITH COUNT(n1) + COUNT(n2)AS score, w.name AS word, p.text AS post, p.id AS _id
RETURN post, word, score, _id;
''' %(text, id, text)
I just can't find out the problem here.
Thanks!

Well, you may have a data modeling problem here.
You're using MERGE when creating your word nodes, so if that word was added from any prior query with text, it will reuse that same node, so your more common word nodes (a, the, and, I, etc) will likely have many [:NEXT] relationships which will continue to grow with each query.
Is this how you mean this to behave, or are you only going to be asking your db questions about words used in only the given text in the query?
EDIT
The problem is the merging of the :Word nodes. This will match on any prior :Word node created from any previous query, and will be matched to from any future query. It's not enough to merge the :Word node itself; to make your words local only to each associated post, you have to merge the relationship of the word from your post at the same time.
We can also clean up the patterns used to match to calculate the word score, as all we need is the number of [:NEXT] relationships of any direction from each word.
query = '''
WITH split("%s"," ") as words
MERGE (p:Post {id: '%s', text: '%s'})
WITH p, words
UNWIND range(0,size(words)-2) as idx
MERGE (p)-[:CONTAINS]->(w1:Word {name:words[idx]})
MERGE (p)-[:CONTAINS]->(w2:Word {name:words[idx+1]})
MERGE (w1)-[:NEXT]->(w2)
WITH p
MATCH (p)-[:CONTAINS]->(w:Word)
WITH size( ()-[:NEXT]-(w) ) AS score, w.name AS word, p.text AS post, p.id AS _id
RETURN post, word, score, _id;
''' %(text, id, text)

My solution is:
query = '''
WITH split("%s"," ") AS words
MERGE (p:Post {id: "%s", text:"%s"})
WITH p, words
UNWIND range(0,size(words)-2) as idx
MERGE (w1:Word {name:words[idx]})
MERGE (w2:Word {name:words[idx+1]})
MERGE (w1)-[n:NEXT]->(w2)
ON MATCH SET n.count = n.count + 1
ON CREATE SET n.count = 1
MERGE (p)-[:CONTAINS]->(w2)
MERGE (p)-[:CONTAINS]->(w1)
''' %(text, id, text)

Related

How to do this in a single Cypher Query?

So this is a very basic question. I am trying to make a cypher query that creates a node and connects it to multiple nodes.
As an example, let's say I have a database with towns and cars. I want to create a query that:
creates people, and
connects them with the town they live in and any cars they may own.
So here goes:
Here's one way I tried this query (I have WHERE clauses that specify which town and which cars, but to simplify):
MATCH (t: Town)
OPTIONAL MATCH (c: Car)
MERGE a = ((c) <-[:OWNS_CAR]- (p:Person {name: "John"}) -[:LIVES_IN]-> (t))
RETURN a
But this returns multiple people named John - one for each car he owns!
In two queries:
MATCH (t:Town)
MERGE a = ((p:Person {name: "John"}) -[:LIVES_IN]-> (t))
MATCH (p:Person {name: "John"})
OPTIONAL MATCH (c:Car)
MERGE a = ((p) -[:OWNS_CAR]-> (c))
This gives me the result I want, but I was wondering if I could do this in 1 query. I don't like the idea that I have to find John again! Any suggestions?
It took me a bit to wrap my head around why MERGE sometimes creates duplicate nodes when I didn't intend that. This article helped me.
The basic insight is that it would be best to merge the Person node first before you match the towns and cars. That way you won't get a new Person node for each relationship pattern.
If Person nodes are uniquely identified by their name properties, a unique constraint would prevent you from creating duplicates even if you run a mistaken query.
If a person can have multiple cars and residences in multiple towns, you also want to avoid a cartesian product of cars and towns in your result set before you do the merge. Try using the table output in Neo4j Browser to see how many rows are getting returned before you do the MERGE to create relationships.
Here's how I would approach your query.
MERGE (p:Person {name:"John"})
WITH p
OPTIONAL MATCH (c:Car)
WHERE c.licensePlate in ["xyz123", "999aaa"]
WITH p, COLLECT(c) as cars
OPTIONAL MATCH (t:Town)
WHERE t.name in ["Lexington", "Concord"]
WITH p, cars, COLLECT(t) as towns
FOREACH(car in cars | MERGE (p)-[:OWNS]->(car))
FOREACH(town in towns | MERGE (p)-[:LIVES_IN]->(town))
RETURN p, towns, cars

How do you do a for each to create nodes for each element of a node property?

I have nodes representing algorithms with the author property. I want to create nodes for people who are in the author of the algorithms and create WORKED_ON relationships between these people and the algorithms. So I tried:
FOREACH (p:author IN al:Algorithm | CREATE (p:PERSON).
(p)-[:WORKED_ON]->(al:Algorithm))
But it returns:
Invalid input ':': expected "IN" (line 2, column 11 (offset: 156)).
"FOREACH (p:author IN al:Algorithm | CREATE (p:PERSON)"
assuming that the author property contains a list of names , separated by a comma, you could do something like this
MATCH (a:Algorithm)
FOREACH( authorName IN SPLIT(a.author,',') |
MERGE (p:Person {name:authorName})
MERGE (p)-[:WORKED_ON]->(a)
)
NOTE : for the MERGE to work fast, you should set
CREATE CONSTRAINT ON (p:Person) ASSERT p.name IS UNIQUE
FOREACH works on a collection, so you would have to collect the Algorithm nodes first. Something like,
MATCH (n:Algorithm) with collect(n) as algos
FOREACH(a in algos | CREATE (p:Person {name: a.author})-[:WORKED_ON]->(a))
However, there may be a simpler way create those Person nodes,
MATCH (a:Algorithm)
CREATE (a)<-[:WORKED_ON]-(:Person {name: a.author})

Cypher merge nodes with same property and collected the other property

I have nodes with this structure
(g:Giocatore { nome, match, nazionale})
(nome:'Del Piero', match:'45343', nazionale:'ITA')
(nome:'Messi', match:'65324', nazionale:'ARG')
(nome:'Del Piero', match:'18235', nazionale:'ITA')
The property 'match' is unique (ID's of match) while there are several 'nome' with the same name.
I want to merge all the nodes with the same 'nome' and create a collection of different 'match' like this
(nome:'Del Piero', match:[45343,18235], nazionale:'ITA')
(nome:'Messi', match:'65324', nazionale:'ARG')
I tried with apoc library too but nothing works.
Any idea?
Can you try this query :
MATCH (n:Giocatore)
WITH n.nome AS nome, collect(n) AS node2Merge
WITH node2Merge, extract(x IN node2Merge | x.match) AS matches
CALL apoc.refactor.mergeNodes(node2Merge) YIELD node
SET node.match = matches
Here I'm using APOC to merge the nodes, but then I do a map transformation on the node list to have an array of match, and I set it on the merged node.
I don't know if you have a lot of Giocatore nodes, so perhaps this query will do an OutOfMemory exception, so you will have to batch your query. You can for example replace the first line by MATCH (n:Giocatore) WHERE n.nome STARTS WITH 'A' and repeat it for each letter or you can also use the apoc.periodic.iterate procedure :
CALL apoc.periodic.iterate(
'MATCH (n:Giocatore) WITH n.nome AS nome, collect(n) AS node2Merge RETURN node2Merge, extract(x IN node2Merge | x.match) AS matches',
'CALL apoc.refactor.mergeNodes(node2Merge) YIELD node
SET node.match = matches',
{batchSize:1000,parallel:true,retries:3,iterateList:true}
) YIELD batches, total

neo4j match node in array order by input

I am trying to implement https://neo4j.com/blog/moving-relationships-neo4j/ pointer functionality for using it as a team order machine.See http://imgur.com/a/MViF0 for a model. I am using this cypher query.
MERGE (list:LIST)
WITH list
MATCH (u) WHERE ID(u) IN [421, 419, 420]
MERGE (team:TEAM{name: u.name})
MERGE (team)-[:PARTOF]->(list)
WITH collect(team)as elems,list
FOREACH (n IN RANGE(0, LENGTH(elems)-2) |
FOREACH (prec IN [elems[n]] |
FOREACH (next IN [elems[n+1]] |
MERGE (prec)-[:NEXT]->(next))))
with list
MATCH (elem:TEAM) WHERE NOT (elem)<-[:NEXT]-()
MERGE (list)-[:POINTER]->(elem)
Now this works quite nicely, but I have only one problem. This line:
MATCH (u) WHERE ID(u) IN [421, 419, 420]
returns my original teams ordered by id, but I would like to define my order by the pattern in the [421,419,420] pattern, like a function that
return * order by my array input.
Keep in mind that it should work for any amount of teams,this is just an example. And that my original team node isn't labeled a team but something else, so we make a duplicate every time. Any input appreciated, thanks.
Try to use the statement "unwind":
MERGE (list:LIST)
WITH list
UNWIND [421, 419, 420] as uid
MATCH (u) WHERE id(u) = uid
MERGE (team:TEAM{name: u.name})
...
[Update] Of course, it is possible to know the order manually for each node:
MERGE (list:LIST)
WITH list, [3871013, 3871011, 3871012] as ids
MATCH (u) WHERE ID(u) IN ids
WITH list, u,
FILTER(x in RANGE(0,size(ids)-1) WHERE ids[x] = id(u)) as orderIndex
ORDER BY orderIndex[0] // Sort by node position in the array of identifiers
MERGE (team:TEAM{name: u.name})
...

Order list without scanning every node

When using LIMIT with ORDER BY, every node with the selected label still gets scanned (even with index).
For example, let's say I have the following:
MERGE (:Test {name:'b'})
MERGE (:Test {name:'c'})
MERGE (:Test {name:'a'})
MERGE (:Test {name:'d'})
Running the following gets us :Test {name: 'a'}, however using PROFILE we can see the entire list get scanned, which obviously will not scale well.
MATCH (n:Node)
RETURN n
ORDER BY n.name
LIMIT 1
I have a few sorting options available for this label. the order of nodes within these sorts should not change often, however, I can't cache these lists because each list is personalized for a user, i.e. a user may have hidden :Test {name:'b'}
Is there a golden rule for something like this? Would creating pointers from node to node for each sort option be a good option here? Something like
(n {name:'a'})-[:ABC_NEXT]->(n {name:'b'})-[:ABC_NEXT]->(n {name:'c'})-...
Would I be able to have multiple sort pointers? Would that be overkill?
Ref:
https://neo4j.com/blog/moving-relationships-neo4j/
http://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-relationships-between-a-collection-of-nodes-invalid-input/
Here's what I ended up doing for anyone interested:
// connect nodes
MATCH (n:Test)
WITH n
ORDER BY n.name
WITH COLLECT(n) AS nodes
FOREACH(i in RANGE(0, length(nodes)-2) |
FOREACH(node1 in [nodes[i]] |
FOREACH(node2 in [nodes[i+1]] |
CREATE UNIQUE (node1)-[:IN_ORDER_NAME]->(node2))))
// create list, point first item to list
CREATE (l:List { name: 'name' })
WITH l
MATCH (n:Test) WHERE NOT (m)<-[:IN_ORDER_NAME]-()
MERGE (l)-[:IN_ORDER_NAME]->(n)
// getting 10 nodes sorted alphabetically
MATCH (:List { name: 'name' })-[:IN_ORDER_NAME*]->(n)
RETURN n
LIMIT 10

Resources