Creating unique relationships without creating unique nodes in neo4j - foreach

I am breaking out a question I asked elsewhere into a second part.
For a given node which has an id_str that is known to be in the graph, I have a list of new id_str that may or may not be in the graph. If they /are/ in the graph, I would like to create unique relationships to them. (If they are not, I want to ignore them.)
My current method is quite slow. I am doing the looping part outside of Neo, using py2neo and writing the entries one at a time using a very slow filter.
Originally, I was using...
fids = get_fids(record) # [100001, 100002, 100003, ... etc]
ids_in_my_graph = filter(id_is_in_graph, fids) # [100002]
def id_is_in_graph(id):
val = False
query = """MATCH (user:User {{id_str:"{}"}})
RETURN user
""".format(id)
n=neo4j.CypherQuery(graph_db,query).execute_one()
if n:
val = True
return(val)
for i in ids_in_my_graph:
"""MATCH (user:User {{id_str:"{}"}}),(friend:User {{id_str:"{}"}})
WHERE has(user.id_str) AND has(friend.id_str)
CREATE UNIQUE (user)-[:FRIENDS]->(friend)""".format(record.id, i)
And while I want new /unique/ [:FRIENDS] relationships, I do not want to create new users or new friends if a node does not already exist with a valid id_str.
So, I am trying to rewrite this using the FOREACH with collections. I think the actual syntax would be...
MATCH (user:User {id_str:"200001"}), (friends:User)
WHERE friends.id_str IN ["100001", "100002", "100003", "JUNK", "DOESNTMATCH", "IGNORED"]
FOREACH(friend in friends :
CREATE UNIQUE user -[:FRIENDS]-> friend)
But my error is
py2neo.neo4j.SyntaxException: Invalid input 'U': expected whitespace, comment, NodeLabel, MapLiteral, a parameter, a relationship pattern, '.', node labels, '[', "=~", IN, IS, '*', '/', '%', '^', '+', '-', '<', '>', "<=", ">=", '=', "<>", "!=", AND, XOR, OR or '|' (line 3, column 48)
" FOREACH(friend in friends : CREATE UNIQUE user -[:FRIENDS]-> friend)"
Create Unique does not seem to be supported for the FOREACH construct, even though this answer suggests this has been fixed.
And again, I cannot use the syntax suggested here in 11.2.2 because I do not want additional nodes to be created, only new relationships to already-existing nodes.

Couple of problems:
First, it will want parenthesises around the user and friend node in the CREATE UNIQUE pattern.
Second, the ":" separator inside FOREACH has been changed to "|", because there were readability clashes with the ":" used for types and labels.
Third, you should use MERGE instead of create unique. It's faster, more predictable and it replaces CREATE UNIQUE.
Finally:
Conceptually, the "friends" identifier points to one friend "at a time", so to speak, it isn't a collection of all the friends. You can turn it into such by doing:
WITH user, COLLECT(friends) AS friends
Of course, as you might've guessed, that actually means you don't need the FOREACH at all, so your final query could be:
MATCH (user:User {id_str:"200001"}), (friend:User)
WHERE friend.id_str IN ["100001", "100002", "100003", "JUNK", "DOESNTMATCH", "IGNORED"]
MERGE (user) -[:FRIENDS]-> (friend)
Make sure you have an index defined on friend.id_str, otherwise this will be very slow :)

Related

how show graph with relationship filtrered?

I have a node called "COG1476" which has different relationships with other nodes but I would like to get only those relationships that have a score> = 700 and I would also like to get the graph.
MATCH (cog1 {name: 'COG1497'})-[rel:coexpression|cooccurence|database|experimental|fusion|neighborhood|score|textmining]->(cog2)
WHERE toInteger(rel.score)>=700, toInteger(rel.cooccurence)>=700, toInteger(rel.coexpression)>=700, toInteger(rel.database)>=700, toInteger(rel.experimental)>=700, toInteger(rel.fusion)>=700, toInteger(rel.neighborhood)>=700,toInteger(rel.textmining)>=700
RETURN cog1, cog2, rel.score>=700, rel.cooccurence>=700, rel.coexpression>=700, rel.fusion>=700, rel.database=700, rel.experimental>=700, rel.neighborhood>=700, rel.textmining>=700
Neo.ClientError.Statement.SyntaxError: Invalid input ',': expected 0..9, '.', 'e', 'E', an identifier character, whitespace, node labels, '[', "=~", IN, STARTS, ENDS, CONTAINS, IS, '^', '*', '/', '%', '+', '-', '=', '~', "<>", "!=", '<', '>', "<=", ">=", AND, XOR, OR, FROM GRAPH, CONSTRUCT, LOAD CSV, START, MATCH, UNWIND, MERGE, CREATE UNIQUE, CREATE, SET, DELETE, REMOVE, FOREACH, WITH, CALL, RETURN, UNION, ';' or end of input (line 2, column 32 (offset: 160))
"WHERE toInteger(rel.score)>=700, toInteger(rel.cooccurence)>=700, toInteger(rel.coexpression)>=700, toInteger(rel.database)>=700, toInteger(rel.experimental)>=700, toInteger(rel.fusion)>=700, toInteger(rel.neighborhood)>=700,toInteger(rel.textmining)>=700"
^
Based on your comments, I think two things are wrong:
You've got a syntax error in your WHERE clause, which we fix by replacing the commas with ORs
You need to configure the Neo4j Browser app to only show matched relationships (or use the Table view)
First let's fix the query:
MATCH (cog1 {name: 'COG1497'})-[rel:coexpression|cooccurence|database|experimental|fusion|neighborhood|score|textmining]->(cog2)
WHERE toInteger(rel.score)>=700 OR toInteger(rel.cooccurence)>=700 OR toInteger(rel.coexpression)>=700 OR toInteger(rel.database)>=700 OR toInteger(rel.experimental)>=700 OR toInteger(rel.fusion)>=700 OR toInteger(rel.neighborhood)>=700 OR toInteger(rel.textmining)>=700
RETURN cog1, cog2, rel
That should return data and not blow up with an error. However, in the Browser you'll still see all the relationships between the nodes even though some of those relationships don't match our WHERE clause - that's just the default behaviour of the Neo4j Browser when visualising a graph.
To fix that, hit the Settings cog icon at the bottom left of the screen and untick the checkbox marked 'Connect result nodes' at the end of the configuration options. You'll now only see connections between nodes that you've explicitly selected - you may want to toggle this back on after you're done.
You can also check your results by using the Table view, which will show only those relationships that matched the criteria in your WHERE clause.

Sub queries in Cypher Query Language

I want to write a CREATE relationship statement for a project i'm working on. The statement has to be something like this
CREATE (match (p:Halt) where p.name="Ananda College" return p)-[:next_halt {route:['103'],dist:1.45}]->(MATCH (p:Halt) where p.name="Borella" return p)
As you can see i want the start node and end node to have values coming from another CQL statement.
But when i run this query there seems to be a syntax error. I have gone through some tutorials to see where my query is wrong but being a beginner i can't really tell.
Invalid input '(': expected whitespace, comment, node labels, MapLiteral, a parameter, ')' or a relationship pattern (line 1, column 15 (offset: 14))
"CREATE (match (p:Halt) where p.name="Ananda College" return p)-[:next_halt {route:['103'],dist:1.45}]->(MATCH (p:Halt) where p.name="Borella" return p)"
Your syntax is rather mixed up here. Please reread the dev documentation and maybe look at the Cypher cheat sheet.
As for proper syntax, you don't even need nesting to get what you want. First, you match on your start and end nodes, then you can use the bound variables in other parts of your query, such as to create a relationship:
MATCH (start:Halt), (stop:Halt)
WHERE start.name = "Ananda College" AND stop.name="Borella"
CREATE (start)-[:next_halt {route:['103'], dist:1.45}]->(stop)
If you aren't sure if nodes (or the relationship) exist or not, you can use MERGE instead, which will MATCH on existing nodes (or relationships) or create them if they do not exist.

How to MATCH and CREATE a node and relationship?

Learning Neo4j and need help with getting the basics right. I am trying to find a Matching candidate then create a company and create a relationship between candidate and the newly created company. So, my query is
MATCH (b:Candidate {name:'Bala'}), CREATE (e:Employer {name:'Yahoo'}),
CREATE (b)-[:WORKED_IN]->(e)
RETURN b,e;
Invalid input '(': expected whitespace, comment, '=', node labels, MapLiteral, a parameter, a relationship pattern, ',', USING, WHERE, LOAD CSV, START, MATCH, UNWIND, MERGE, CREATE, SET, DELETE, REMOVE, FOREACH, WITH, RETURN, UNION, ';' or end of input...
I am using 2.2.5 console.
Remove the two commas before CREATE. The clauses in Cypher are not comma separated, only elements within a clause are. Your query will read
MATCH (b:Candidate {name:'Bala'})
CREATE (e:Employer {name:'Yahoo'})
CREATE (b)-[:WORKED_IN]->(e)
RETURN b,e;

Can I create and relate two nodes with the same name but different ids in neo4j

I have created two nodes in neo4j with the same name and label but with different ids:
CREATE (P:names {id:"1"})
CREATE (P:names{id:"2"})
My question is if I can create a relationship between these two nodes like this:
MATCH (P:names),(P:names)
WHERE P.id = "1" AND P.id = "2"
CREATE (P)-[r:is_connected_with]->(P) RETURN r"
I try it but it doesn't work.
Is it that I shouldn't create nodes with the same name or there is a workaround?
How about the following?
First run the create statements:
CREATE (p1:Node {id:"1"}) // note, named p1 here
CREATE (p2:Node {id:"2"})
Then, do the matching:
MATCH (pFirst:Node {id:"1"}), (pSecond:Node {id:"2"}) // and here we can call it something else
CREATE pFirst-[r:is_connected_with]->(pSecond)
RETURN r
Basically, you are matching two nodes (with the label Node). In your match you call them p1 and p2 but you can change these identifiers if you wish. Then, simply create the relationship between them.
You should not create identifiers with the same name. Also note that p1 and p2 are not the name of the node, it is the name of the identifier in this particular query.
EDIT: After input from the OP I have created a small Gist that illustrates some basics regarding Cypher.
#wassgren has the right answer about how to fix your query but I might be able to fill in some details about why and it's too long to leave in a comment.
The character before the colon when describing a node or relationship is referred to as an identifier, it's just a variable representing a node/rel within a Cypher query. Neo4j has some naming conventions that you are not following and as a result, it makes your query harder to read and will be harder for you to get help in the future. Best practices are:
Identifiers start lowercase: person instead of Person1, p instead of P
Labels are singular and have their first character capitalized: (p1:Name), not (p1:Names) or (p1:names) or (p1:name)
Relationships are all caps, [r:IS_CONNECTED_WITH], not [r:is_connected_with], though this one gets broken all the time ;-)
Back to your query, it both won't work and it doesn't follow conventions.
Won't work:
MATCH (P:names),(P:names)
WHERE P.id = "1" AND P.id = "2"
CREATE (P)-[r:is_connected_with]->(P) RETURN r
Will work, looks so much better(!):
MATCH (p1:Name),(p2:Name)
WHERE p1.id = "1" AND p2.id = "2"
CREATE (p1)-[r:IS_CONNECTED_WITH]->(p2) RETURN r
The reason your query doesn't work, though, is that by writing MATCH (P:names),(P:names) WHERE P.id = "1" AND P.id = "2", you are essentially saying "find a node, call it 'P', with an ID of both 1 and 2." That's not what you want and it obviously won't work!
If you're trying to create many nodes, you would rerun this query for each pair of nodes you want to create, changing the ID you assign each time. You can create the nodes and their relationship in one query, too:
CREATE (p1:Name {id:"1"})-[r:IS_CONNECTED_WITH]->(p2:Name {id:"2"}) RETURN r
In the app, just change the ID you want to assign to the nodes before you run the query. The identifiers are instance variables, they disappear when the query is complete.
EDIT #1!
One more thing, setting the id property within your app and assigning it to the database instead of relying on the Neo4j-created internal ID is a best practice. I suggest avoiding sequential IDs and instead using something to create a unique ID. In Ruby, many people use SecureRandom::uuid for this, I'm sure there's a parallel in whatever language(s) you are using.
EDIT #2!
Neo4j supports integer properties. {id:"1"} != {id: 1}. If your field is supposed to be an integer, use an integer.

Neo4j: implementing soft delete with optional relationships

I'm trying to implement a soft delete in Neo4j. The graph described in Cypher from Alice's viewpoint is as such:
(clyde:User)<-[:FOLLOWS]-(alice:User)-[:LIKES]->(bob:User)
Instead of actually deleting a node and its relationships, I'm
changing its label so it can no longer be looked up directly, i.e. dropping its User label and adding a _User label (notice the underscore)
replacing its relationships so it can't be reached anymore by my normal queries, e.g. deleting its :FOLLOWS relationships and replacing it with :_FOLLOWS relationships.
So this is basically the equivalent of moving a row to an archiving table in a relational database. I figured this is a pretty efficient approach because you're effectively never visiting the parts of the graph that have been soft-deleted. Also, you don't have to modify any of your existing queries.
The result of soft-deleting Alice should be this:
(clyde:User)<-[:_FOLLOWS]-(alice:_User)-[:_LIKES]->(bob:User)
My first attempt at the query was this:
match (user:User {Id: 1})
optional match (user)-[follows:FOLLOWS]->(subject)
remove user:User set user:_User
delete follows
create (user)-[:_FOLLOWS]->(subject);
The problem is that when this user is not following anyone, the query tries to create a relationship between user and null because the second match is optional, so it gives me this error: Other node is null.
My second attempt was this:
match (user:User {Id: 1})
remove user:User set user:_User
optional match (user)-[follows:FOLLOWS]->(subject)
foreach (f in filter(f in collect({r: follows, n: subject}) where f.r is not null) | delete f.r create (user)-[:_FOLLOWS]->(f.n));
So I'm putting the relationship and the subject into a map, collecting these maps in a collection, throwing every "empty" map away and looping over the collection. But this query gives me this error:
SyntaxException: Invalid input '.': expected an identifier character, node labels, a property map, whitespace or ')' (line 1, column 238)
Does anyone know how I can fix this?
Thanks,
Jan
Could you change the label first and then match for relationships? Then you should be able to use 'non-optional' match, and not have to deal with the cases where there are no follows relationships, something like
MATCH (user:User {Id: 1})
REMOVE user:User SET user:_User
WITH user
MATCH (user)-[follows:FOLLOWS]->(subject)
DELETE follows
CREATE (user)-[:_FOLLOWS]->(subject)
Or you could carry the user, follows and subject and filter on where subject is not null. Something like
MATCH (user:User {Id: 1})
OPTIONAL MATCH (user)-[follows:FOLLOWS]->(subject)
REMOVE user:User SET user:_User
WITH user, follows, subject
WHERE subject IS NOT NULL
DELETE follows
CREATE (user)-[:_FOLLOWS]->(subject)
Edit:
If the problem is that you want to do this for more than one kind of relationship, then you could try
MATCH (user:User {Id: 1})
REMOVE user:User SET user:_User
WITH user
MATCH (user)-[f:FOLLOWS]->(other)
DELETE f
CREATE (user)-[:_FOLLOWS]->(other)
WITH user LIMIT 1
MATCH (user)-[l:LIKES]->(other)
DELETE l
CREATE user-[:_LIKES]->(other)
You can keep extending it with other relationship types, just be sure to limit user when you carry, since multiple matches (user)-[r]->(other) means there are multiple results for user, or you'll run the next query part multiple times.
I don't think there is a generic way to do it in cypher since you can't dynamically build the relationship type (i.e. CREATE (a)-[newRel:"_"+type(oldRel)]->(b) doesn't work)
Is something like that what you are looking for or am I misunderstanding your question?

Resources