I'm totally new to Neo4j and I'm testing it in these days.
One issue I have with it is how to correctly implement a relationship which involves 3 different nodes using Spring Data. Suppose, for example, that I have 3 #NodeEntitys: User, Tag and TaggableObject.
As you can argue, a User can add a Tag to a TaggableObject; I model this operation with a #RelationshipEntity TaggingOperation.
However, I can't find a simple way to glue the 3 entities inside the relationship. I mean, the obvious choice is to set #StartNode User tagger and #EndNode TaggedObject taggedObject; but how can I also add the Tag to the relationship?
This is called a "hyperedge", I believe, and it's not something that Neo4j supports directly. You can create an additional node to support it, tough. So you could have a TagEvent node with a schema like so:
(:User)-[:PERFORMED]->(:TagEvent)
(:Tag)<-[:USED]-(:TagEvent)
(:TagObject)<-[:TAGGED]-(:TagEvent)
Another alternative is to store a foreign key as a property on a relationship or a node. Obviously that's not very graphy, but if you just need it for reference that might not be a bad solution. Just remember to not use the internal Neo4j ID as in future versions that may not be dependable. You should create your own ID for this purpose.
Related
I know how this would be accomplished in SQL but having difficulty wrapping my brain around how to do this in cypher..
Basically working on a master data setup where a user has a master_id (node) and need to use an existing relationship property to determine the master_id in order to create a new relationship between the master_id node and a location node.
Currently have master users created as nodes that contain a master_id property. A relationship is created between the master user and a brand, and the relationship has a property of brand_user_id.
I now have another file I need to import that contains data at the brand_user level, but need to create the relationship between the master_id and a location node. In order to do this because the file does not contain the master_id property, I am attempting to use the new file to lookup master_id's based on the existing relationship with the brand, then use that master_id to create the new relationship with the location.
Have this relationship:
(m:Master{master_id:12345})-[:IS_BRAND_USER{brand_user_id:9876}]->(b:Brand{name:"Acme"})
Have this file:
brand_user_id,location_id
9876,6
Need this relationship:
(m:Master{master_id:12345})-[:HAS_LOCATION]->(l:Location{id:6})
My approach:
LOAD CSV WITH HEADERS FROM "file:///brand_user_ids.csv" as buid
MATCH (m:Master)-[r:IS_BRAND_USER{brand_user_id:buid.id}]->(b:Brand)
WITH m, buid.location_id AS location_id
MATCH (l:Location {id: location_id})
CREATE (m)-[:HAS_LOCATION {source: 'abcdef'}]->(l)
Seems to run for an extremely long time and not seeing any real progress after an hour so I'm wondering if this is fundamentally the correct approach or not, or if I have inadvertently created some horrific cross join equivalent.
The problem is that you are trying to enter a graph on a relationship. And that always requires lots of "scanning the graph".
Now, I'm not a specialist in your domain, but you might be missing a type of nodes here ... BrandUser. And there could be several reasons for that :
Based on your data a Master can have many BrandUser id's. Potentially even more than one per Brand ? Do you have other properties that make sense on the BrandUser level ?
That Location data is strange. Wouldn't you agree it's actually the BrandUser that has a location and that a Master may have many locations ?
The most important reason is however ... if you're going to enter the graph on the brand_user_id all the time (and judging from the location example that may be the case) ... you've got the reason to turn it into a node right there.
So ... it's a modeling issue really.
Hope this helps.
Regards,
Tom
I need to write batch importing utility for my Neo4j database but I don't want to lose the repository feature of SDN. To achieve this goal I want to insert such nodes that can be still queried using auto generated repository methods.
I inserted some nodes to my database and I looked at their properties and labels to see how they are set and I noticed that SDN inserted nodes have two labels. For example nodes representing class SomeClass have labels: ["_SomeClass", "SomeClass"]. My question is: why set two, almost identical labels for each node?
Oh that's actually simple. We somehow have to note if the current node is of type SomeClass, which we do by prepending the "_". As there are labels added for each super-type you need to differentiate what the actual type of the node in Spring Data Neo4j is.
So you could have: _Developer, Developer, Employee, Person for a class hierarchy from Person down to Developer. And then there could be additional labels for interfaces.
When you now do: DeveloperRepository.findAll() then you only want those with _Developer back, not ones that derived from Developer.
I am using an embedded graph database as part of a java application. Suppose that I carry out some type of cypher query, and return an ExecutionResult which contains a collection of nodes.
These nodes may be assumed to form a connected graph.
Each of these nodes has some relationships, which I can access using node.getRelationships(Direction.OUTGOING). My question is, if the target of one of these relationships already occurs in the Execution result (i.e. the relationship is part of the query template), is it guaranteed that Relationship.getEndPoint == Node X.
I suppose that what I am really asking is, when a transaction in Neo4j returns a node, does it return just the one object, and different queries will just keep returning references to that one object, or does it keep producing new objects which happen to refer to the same data point? Since Node doesn't override the equalsTo method, I have been assuming the former, but I was hoping someone could tell me.
Nodes are not reference-equals. You'll only get NodeProxy objects which are created on the fly in different operations.
But the equals()-method does id-equality so you should use that.
n1.equals(n2)
or if you keep the node id around use
n1.getId() == n2.getId()
See when you create a node neo4j internally assigns it a node-id. All the relationships you create will have reference to the start node id and end node id.
For checking do this
First create a node and save its node id by calling method node.getId()
Now create a relationship to it from another node. And call your relationship.getEndNode().getId() .
You will see the node-ids are same.
It sounds like your asking - does Neo 'out of the box' give concurrency control of database entities, like n-hibernate or entity framework does for SQL.
The answer is no! You will have to manage it yourself. If you do delelop it though, could make you a few bob
I'm quite new to neo4j. I'm developing a web app(using express.js and async) that does a POST request, which in turn creates a triangle of nodes and relationships. So, there are 6 queries and I want to use auto-increment ID (or rowID) of the created nodes (using id(a)) to create relationships.
As I saw in another post(Node identifiers in neo4j), rowID should not be relied for reuse. But, I have no other way of identifying my nodes (unless if I create an index on all the properties which is a pain).
Hence, my question, can I use rowID for this use-case ? If not, what kind of use case suits better for rowID ?
If you only need an id to create the triangles, then you can use id(n), but probably you can just create the triangle with a single cypher statement.
Perhaps you can share more of your code/domain?
Usually you should have a business-key / -id that you can use.
Create your own id and store it on the node as a property, if you have no unique id that you can use.
I'm using Spring data neo4j 2.1.0.BUILD-SNAPSHOT and Neo4j 1.6.1 server.
I have a Friendship relationship between two User nodes, and I want to ensure that only one relationship will be created for every user1, user2 pair (the order doesn't count).
Common suggestion is to check at application level if a relationship already exists before creating another one, but I think that doesn't avoid concurrency problems: the constraint should be managed at the database level.
The best solution I can think of is to use the #Indexed annotation with unique property introduced in Neo4j 1.6 and create a unique constraint based on the user1 and user2 ids, something like
#Indexed(unique = true)
private String uniqueConstraint;
public String getUniqueConstraint(){
if(user1.id > user2.id){
return user1.id + "|" + user2.id;
}
return user2.id + "|" + user1.id;
}
BTW I know that the latest release of Spring data neo4j supports this check on nodes with Neo4jTemplate.getOrCreateNode(), but I'm not sure it works with Relationships. The rest API should be there though. Unique relationship
So I have two questions:
1 Is there any better alternative?
2 Should I be bothered by this concurrency problem, or it's very unlikely that something bad happens even in a high traffic site and the check at application level should be sufficient? I ask because it seems to me a very common problem, but there's little around about this with Neo4j. Maybe the embedded version suffers less from this.
Thanks
The usual approach in SDN of having relationships between two nodes already ensures that there is only one relationship of one type between them (by checking upfront).
It doesn't yet leverage the uniqueness support in Neo4j for that.
And yes, with the REST-Server this approach might run into concurrency/racing conditions.
The embedded version supports locking (e.g. on one of the 2 nodes - or both) and then creating the relationship with that lock in place. So that there is no second thread doing the same thing at the same time.
It might be ok, if you do it optimistically. I.e. check after creation and delete afterwards. You can also leverage the REST-API directly to support that behaviour. We probably add support for that by SDN 2.1 could you raise an issue (linking to this post) at http://spring.neo4j.org/issues ?