Enforce relationship uniqueness with Neo4j - neo4j

I'm using Spring data neo4j 2.1.0.BUILD-SNAPSHOT and Neo4j 1.6.1 server.
I have a Friendship relationship between two User nodes, and I want to ensure that only one relationship will be created for every user1, user2 pair (the order doesn't count).
Common suggestion is to check at application level if a relationship already exists before creating another one, but I think that doesn't avoid concurrency problems: the constraint should be managed at the database level.
The best solution I can think of is to use the #Indexed annotation with unique property introduced in Neo4j 1.6 and create a unique constraint based on the user1 and user2 ids, something like
#Indexed(unique = true)
private String uniqueConstraint;
public String getUniqueConstraint(){
if(user1.id > user2.id){
return user1.id + "|" + user2.id;
}
return user2.id + "|" + user1.id;
}
BTW I know that the latest release of Spring data neo4j supports this check on nodes with Neo4jTemplate.getOrCreateNode(), but I'm not sure it works with Relationships. The rest API should be there though. Unique relationship
So I have two questions:
1 Is there any better alternative?
2 Should I be bothered by this concurrency problem, or it's very unlikely that something bad happens even in a high traffic site and the check at application level should be sufficient? I ask because it seems to me a very common problem, but there's little around about this with Neo4j. Maybe the embedded version suffers less from this.
Thanks

The usual approach in SDN of having relationships between two nodes already ensures that there is only one relationship of one type between them (by checking upfront).
It doesn't yet leverage the uniqueness support in Neo4j for that.
And yes, with the REST-Server this approach might run into concurrency/racing conditions.
The embedded version supports locking (e.g. on one of the 2 nodes - or both) and then creating the relationship with that lock in place. So that there is no second thread doing the same thing at the same time.
It might be ok, if you do it optimistically. I.e. check after creation and delete afterwards. You can also leverage the REST-API directly to support that behaviour. We probably add support for that by SDN 2.1 could you raise an issue (linking to this post) at http://spring.neo4j.org/issues ?

Related

neo4j - Relationship between three nodes

I'm totally new to Neo4j and I'm testing it in these days.
One issue I have with it is how to correctly implement a relationship which involves 3 different nodes using Spring Data. Suppose, for example, that I have 3 #NodeEntitys: User, Tag and TaggableObject.
As you can argue, a User can add a Tag to a TaggableObject; I model this operation with a #RelationshipEntity TaggingOperation.
However, I can't find a simple way to glue the 3 entities inside the relationship. I mean, the obvious choice is to set #StartNode User tagger and #EndNode TaggedObject taggedObject; but how can I also add the Tag to the relationship?
This is called a "hyperedge", I believe, and it's not something that Neo4j supports directly. You can create an additional node to support it, tough. So you could have a TagEvent node with a schema like so:
(:User)-[:PERFORMED]->(:TagEvent)
(:Tag)<-[:USED]-(:TagEvent)
(:TagObject)<-[:TAGGED]-(:TagEvent)
Another alternative is to store a foreign key as a property on a relationship or a node. Obviously that's not very graphy, but if you just need it for reference that might not be a bad solution. Just remember to not use the internal Neo4j ID as in future versions that may not be dependable. You should create your own ID for this purpose.

Auto increment property in Neo4j

As far as I understand it the IDs given by Neo4j (ID(node)) are unstable and behave somewhat like row numbers in SQL. Since IDs are mostly used for relations in SQL and these are easily modeled in Neo4j, there doesn't seem to be much use for IDs, but then how do you solve retrieval of specific nodes? Having a REST API which is supposed to have unique routes for each node (e.g. /api/concept/23) seems like a pretty standard case for web applications.
But despite it being so fundamental, the only viable way I found were either via
language specific frameworks
as an unconnected node which maintains the increments:
// get unique id
MERGE (id:UniqueId{name:'Person'})
ON CREATE SET id.count = 1
ON MATCH SET id.count = id.count + 1
WITH id.count AS uid
// create Person node
CREATE (p:Person{id:uid,firstName:'Gabriel',lastName:'Smith'})
RETURN p AS person
Source: http://www.neo4j.org/graphgist?8012859
Is there really not a simpler way and if not, is there a particular reason for it? Is my approach an anti-pattern in the context of Neo4j?
Neo4j internal ids are a bit more stable than sql row id's as they will never change during a transaction for e.g.
And indeed exposing them for external usage is not recommended. I know there are some intentions at Neo internals to implement such a feature.
Basically people tend to use two solutions for this :
Using a UUID generator at the application level like for PHP : https://packagist.org/packages/rhumsaa/uuid and add a label/uuid unique constraint on all nodes.
Using a very handful Neo4j plugin like https://github.com/graphaware/neo4j-uuid that will add uuid properties on the fly, so it remove you the burden to handle it at the application level and it is easier to manage the persistence state of your node objects
I agree with Pavel Niedoba.
I came up with this without and UniqueID Node:
MATCH (a:Person)
WITH a ORDER BY a.id DESC LIMIT 1
CREATE (n:Person {id: a.id+1})
RETURN n
It requires a first Node with an id field though.

Neo4j nodes or relationship supports ttl?

I am learning neo4j , i want to know that is there any way that i can create a relationship or a node that will be delete automatically after a certain period of time.
As pointed out by #Scott in the comments, you can specify a TTL on nodes by using APOC as shown here. Append the following to your neo4j.conf:
apoc.ttl.enabled=true
Then you can either set the appropriate label and property yourself:
SET n:TTL
SET n.ttl = timestamp() + 3600
or utilize one of the following procedures:
// Expires in
CALL apoc.date.expire.in(node,time,'time-unit')
// Expires at
CALL apoc.date.expire(node,time,'time-unit')
There's nothing that I know of like this. Neo4j is just a database like *SQL or MongoDB (though let me know if they can do something like this).
The best suggestion that I would have is to put a delete_after property (or something similar) on the relationships and then have a job which queries on a regular basis to clean them up. Note that you can't query for relationships directly (that is, nodes always need to be involved in your query) so depending on how big your database is, you may need to think through what sort of index you need. I'm a bit vague here because I don't know what your domain model would look like.
If you are like me and stumble to this article, this has recently been updated.
Ref: https://neo4j.com/labs/apoc/4.3/overview/apoc.ttl/apoc.ttl.expireIn/
Match(person:person {id: 100})
CALL apoc.ttl.expireIn(person, 10,'s')
Return person;
Another option for Neo4j is using a Neo4j extension by GraphAware: neo4j-expire
One disadvantage of using such extensions is sometimes they stop supporting them for newer versions of Neo4j and also it takes some time for them to support the latest version. If these things are not a problem with you, you should have no problem with the extension.

Is there any way to ensure that a node is only connected to one instance of a particular relationship type

To clarify, let's assume that we have nodes representing people and the following relationships: "BIOLOGICAL_MOTHER" and "BIOLOGICAL_FATHER".
Then, for any person node, said node can only have one "BIOLOGICAL_MOTHER" and one "BIOLOGICAL_FATHER". How can we ensure that this is the case?
No. Neo4J currently only supports uniqueness constraints.
I believe several people are working on different schema constructs for neo4j, that would permit you to constrain graphs in any number of different ways. What it seems you're asking for boils down to a database constraint that if there is a relationship of type BIOLOGICAL_FATHER from one person to another, that the DB may not accept any creation of new relationships of that same type. In other words, relationship cardinality constraints, by relationship type.
At the moment, I think the best you can do is verify in your application code that such a relationship doesn't exist before creating it, but the DB won't do this checking for you.
The particular constraint you're looking for sounds easy enough, hopefully a neo4j dev will jump in here and say, "Oh, no worries, that's planned for release XYZ" - but I'm not sure about that.
More broadly, there are a number of issues with graphs that make constraints very tricky. In my personal graph domain, I'd like to make it impossible to create new relationships such that they would introduce cycles in the graph over a particular relationship type. (E.g. (a)-[:owns]->(b)-[:owns]->(a) is extremely undesirable for me). This would be a very costly constraint to actually enforce in the general case, since verifying whether a new relationship was OK could potentially involve traversing a huge graph.
Over the long run, it seems reasonable that neo4j might implement local constraints, but still shy away from anything that implied non-local constraint checking.
Steve,
In terms of Cypher, if I am given two names of people - say Sam and Dave, and wish to make Sam the father of Dave, but only if Dave doesn't yet have a father, I could do something like this:
MATCH (f {name : 'Sam'}), (s {name : 'Dave'})
WHERE NOT (s)<-[:FATHER]-()
CREATE (f)-[:FATHER]->(s)
If Dave already has a father the WHERE clause filters Dave out, which means no relationship will be created.
Grace and peace,
Jim

Uniqueness in BatchInserter of Neo4J

I am using a "BatchInserter" to build a graph (in a single thread). I want to make sure nodes (and possibly relationships) are unique. My current solution is to check whether the node exists in the following manner:
String name = (String) nodeProperties.get(IndexKeys.CATEGORY_KEY);
if(index.get(IndexKeys.CATEGORY_KEY, name).size() > 0)
return index.get(IndexKeys.CATEGORY_KEY, name).getSingle();
Long nodeID = inserter.createNode( nodeProperties,categoryLabel );
index.add(nodeID, nodeProperties);
index.flush();
It seems to be working fine but as you can see it is IO expensive (flushing on every new addition - which i believe is a lucene "commit" command). This is slowing down my code considerably.
I am aware of put if absent and uniqueFactory. As documented:
By using put-if-absent functionality, entity uniqueness can be guaranteed using an index.
Here the index acts as the lock and will only lock the smallest part
needed to guaranteed uniqueness across threads and transactions. To
get the more high-level get-or-create functionality make use of
UniqueFactory
However, these are for transaction based interactions with the graph. What I would like to do is to ensure uniqueness of nodes and possibly relationships in a batch insertion semantics, that is faster than my current setup.
Any pointers would be much appreciated.
Thank you
You should investigate the MERGE keyword in cypher. I believe this will permit you to exploit your autoindexes without requiring you to use them yourself. More broadly, you might want to see if you can formulate your bulk load in a way that is conducive to piping large volumes of cypher queries through the neo4j-shell.
Finally, as general pointers and background, you should check out this information on bulk loading
When I encountered this problem, I just decided to go tyrant and force index values in my own. Can't you do the same? I mean, ensure uniqueness before you do the insertions?

Resources