Due to a unforeseen shutdown (and /or and error in my code) I have relationships in the database without nodes attached to it.
I think what happened was that used a MATCH statement to look up a node and subsequently a MERGE to create a relationship. For some reasons however the Match did not return a results, but the MERGE did create a relationship (apparently with a non existing node). See example below:
MATCH (image:Image {id:{param}.id})
FOREACH (tagName in {param}.tags | MERGE (tag:Tag {tag:tagName}) MERGE (image)-[:IS_TAGGED_AS]->(tag) // Here it creates a relationship even if no matching image is found.
)
When I run a simple query I receive the following message:
While loading relationships for Node[xx] a Relationship[xx] was encountered that had startNode: 0 and endNode: 0, i.e. which had neither start nor end node as the node we're loading relationships for
I can reference the node and the relationship by Id (although the relationship does not return results) but can't Delete them.
Is there anything I can do to fix this?
Ideally I create a query to select all 'bad' relationships and delete them.
I am working on Neo4j 2.3.0.
Are you sure that relationships can be created without nodes? That doesn't make sense to me. The way MERGE works is that it will match a node or relationship if it exists, otherwise it will try create a new one.
If you add PROFILE to the start of your query in the Neo4j Browser, you can see the execution plan for a query. You will see that nodes are matched first and then the relationships are matched and/or created afterwards.
Related
I have a simple model of a chess tournament. It has 5 players playing each other. The graph looks like this:
The graph is generally fine, but upon further inspection, you can see that both sets
Guy1 vs Guy2,
and
Guy4 vs Guy5
have a redundant relationship each.
The problem is obviously in the data, where there is a extraneous complementary row for each of these matches (so in a sense this is a data quality issue in the underlying csv):
I could clean these rows by hand, but the real dataset has millions of rows. So I'm wondering how I could remove these relationships in either of 2 ways, using CQL:
1) Don't read in the extra relationship in the first place
2) Go ahead and create the extra relationship, but then remove it later.
Thanks in advance for any advice on this.
The code I'm using is this:
/ Here, we load and create nodes
LOAD CSV WITH HEADERS FROM
'file:///.../chess_nodes.csv' AS line
WITH line
MERGE (p:Player {
player_id: line.player_id
})
ON CREATE SET p.name = line.name
ON MATCH SET p.name = line.name
ON CREATE SET p.residence = line.residence
ON MATCH SET p.residence = line.residence
// Here create the edges
LOAD CSV WITH HEADERS FROM
'file:///.../chess_edges.csv' AS line
WITH line
MATCH (p1:Player {player_id: line.player1_id})
WITH p1, line
OPTIONAL MATCH (p2:Player {player_id: line.player2_id})
WITH p1, p2, line
MERGE (p1)-[:VERSUS]->(p2)
It is obvious that you don't need this extra relationship as it doesn't add any value nor weight to the graph.
There is something that few people are aware of, despite being in the documentation.
MERGE can be used on undirected relationships, neo4j will pick one direction for you (as realtionships MUST be directed in the graph).
Documentation reference : http://neo4j.com/docs/stable/query-merge.html#merge-merge-on-an-undirected-relationship
An example with the following statement, if you run it for the first time :
MATCH (a:User {name:'A'}), (b:User {name:'B'})
MERGE (a)-[:VERSUS]-(b)
It will create the relationship as it doesn't exist. However if you run it a second time, nothing will be changed nor created.
I guess it would solve your problem as you will not have to worry about cleaning the data in upfront nor run scripts afterwards for cleaning your graph.
I'd suggest creating a "match" node like so
(x:Player)-[:MATCH]->(m:Match)<-[:MATCH]-(y:Player)
to enable tracking details about the match separate from the players.
If you need to track player matchups distinct from the matches themselves, then
(x:Player)-[:HAS_PLAYED]->(pair:HasPlayed)<-[:HAS_PLAYED]-(y:Player)
would do the trick.
If the schema has to stay as-is and the only requirement is to remove redundant relationships, then
MATCH (p1:Player)-[r1:VERSUS]->(p2:Player)-[r2:VERSUS]->(p1)
DELETE r2
should do the trick. This finds all p1, p2 nodes with bi-directional VERSUS relationships and removes one of them.
You need to use UNWIND to do the trick.
MATCH (p1:Player)-[r:VERSUS]-(p2:Player)
WITH p1,p2,collect(r) AS rels
UNWIND tail(rels) as rel
DELETE rel;
THe previous code will find the direct connections of type VERSUS between p1 and p2 using match (note that this is not directed). Then will get the collection of relationships and finally the last of those relations, which is deleted.
Of course you can add a check to see whether the length of the collection is 2.
I have been trying to match together two different nodes (process) and (process framework) that have the same id. I want to set a relationship between those (called:SAME).
The query works but it creates 3 times the amount of processes that I wanted and I can´t figure out why:
MATCH (p:Process)
MATCH (pcf:ProcessFramework)
WHERE HAS (p.id) AND HAS (pcf.pcf_id) AND p.id=pcf.pcf_id
MERGE (p)<-[:same]-(pf)
return p,pcf
I also tried it with CREATE UNIQUE instead of MERGE but its the same result. I am not a developer so maybe I am doing a very obvious mistake but I really can´t see it!
Thanks!
You have a typo in your query, you use pf where you should use pcf in the MERGE.
I'd also change it to this and make sure you have an index on :ProcessFramework(pcf_id)
MATCH (p:Process)
MATCH (pcf:ProcessFramework)
WHERE p.id=pcf.pcf_id
MERGE (p)<-[:same]-(pcf)
return p,pcf
I have come across a very strange problem with my data and cypher.
I run this query
match (sm:Small)-->(a:Activity) return distinct a
and it returns all 7 a nodes that exist that are connected to Small's (which is also all of them).
I run this query
match (a:Activity) return distinct a
And it returns only 1 of the a nodes that exists
If I run (same as first one except no direction)
match (sm:Small)--(a:Activity) return distinct a
I also only get one single Activity node
The one activity node was that always shows up was initially added to the database first. The other ones that are missing, were added later in a separate query.
If I run a query in the browser that returns all of them and there outgoing relationships to Small nodes the visualization of them all looks the same and proper.
Is there anyway to further interrogate this?
I am trying to create a graph using some nodes and relationships using py2noe. say, I'm creating a family tree.
I am creating nodes using get_or_create() so that my script doesn't create duplicates if I supply the same value again.
How can I do the same for a relationship? I can't find any reference to a get_or_create() like function for a relationship.
I want to publish (Joe)-[:son]->(John)
the first time it creates 2 nodes joe and john and a link between them.
if I re run my script, as the nodes are unique, they aren't published but a new relationship is created.
This gives me a graph with 2 nodes and n relationships where n is the number of times I run the script.
I also tried using cypher and I get the same issue. It keeps on creating relationships.
Can anyone suggest me a way to solve this problem?
Thanks.
I don't know py2neo (so there may be a wrapper for this function), but the way to achieve that would be using a Cypher MERGE which it looks like you have to run using the raw Cypher statement:
cypher_merge_result = neo4j.CypherQuery(graph_db,
"MERGE (s:Person{name:Joe})-[:SON]->(f:Person{name:John})")
That will create 2 Person nodes and 1 SON relationship, no matter how many times that you run it.
Neo4J documentation is here and it is important to understand how it works as partial matches in the MERGE statement will cause the whole pattern to be created. So if your Person nodes already exist you should match them in advance to avoid duplicate Persons being created. i.e
cypher_merge_result = neo4j.CypherQuery(graph_db,
"MATCH (s:Person{name:Joe}), (f:Person{name:John}) MERGE (s)-[:SON]->(f)")
My database contains about 300k nodes and 350k relationships.
My current query is:
start n=node(3) match p=(n)-[r:move*1..2]->(m) where all(r2 in relationships(p) where r2.GameID = STR(id(n))) return m;
The nodes touched in this query are all of the same kind, they are different positions in a game. Each of the relationships contains a property "GameID", which is used to identify the right relationship if you want to pass the graph via a path. So if you start traversing the graph at a node and follow the relationship with the right GameID, there won't be another path starting at the first node with a relationship that fits the GameID.
There are nodes that have hundreds of in and outgoing relationships, some others only have a few.
The problem is, that I don't know how to tell Cypher how to do this. The above query works for a depth of 1 or 2, but it should look like [r:move*] to return the whole path, which is about 20-200 hops.
But if i raise the values, the querys won't finish. I think that Cypher looks at each outgoing relationship at every single path depth relating to the start node, but as I already explained, there is only one right path. So it should do some kind of a DFS search instead of a BFS search. Is there a way to do so?
I would consider configuring a relationship index for the GameID property. See http://docs.neo4j.org/chunked/milestone/auto-indexing.html#auto-indexing-config.
Once you have done that, you can try a query like the following (I have not tested this):
START n=node(3), r=relationship:rels(GameID = 3)
MATCH (n)-[r*1..]->(m)
RETURN m;
Such a query would limit the relationships considered by the MATCH cause to just the ones with the GameID you care about. And getting that initial collection of relationships would be fast, because of the indexing.
As an aside: since neo4j reuses its internally-generated IDs (for nodes that are deleted), storing those IDs as GameIDs will make your data unreliable (unless you never delete any such nodes). You may want to generate and use you own unique IDs, and store them in your nodes and use them for your GameIDs; and, if you do this, then you should also create a uniqueness constraint for your own IDs -- this will, as a nice side effect, automatically create an index for your IDs.