Is it possible to show only one direction relationship from a bidirectional relationship?
(n)-[:EMAIL_LINK]->(m)
(n)<-[:EMAIL_LINK]-(m)
If the relationship type in question does not have directional semantics, it's best practice to have them only one time in the graph and omit the direction while querying, i.e. (a)-[:EMAIL_LINK]-(b) instead of (a)-[:EMAIL_LINK]->(b).
To get rid of duplicated relationships in different directions, use:
MATCH (a)-[r1:EMAIL_LINK]->(b)<-[r2:EMAIL_LINK]-(a)
WHERE ID(a)<ID(b)
DELETE r2
if your graph is large you need to take care of having reasonable transaction sizes by adding a LIMIT and running the query multiple times until all have been processed.
NB: the WHERE ID(a)<ID(b) is necessary. Otherwise a and b might change roles during in a later iteration. Consequently r1 and r2 would change roles as well and both get deleted.
Related
I'm dealing with the following situation: many trips exist between many cities. Both have various properties. E.g the cities have a name and an amount of trips that passed them, whereas trips have a distance and time.
What is 'best practise' in Neo4j?
a) Add all cities and trips as nodes, and connect the trips to the start and end nodes by means of 'STARTED_AT' and 'ENDS_IN' relations.
or
b) Add only cities as a node, and represent each of the trips as a relation between 2 nodes. This means there are many of the same relations between nodes, where the only difference is that they have other properties.
Information that might be useful: we only need to do all kinds of queries. No insertion needed.
Thanks!
I would argue it's better to store trips as nodes because relationship properties cannot be indexed, and it will be slow to do more complex queries (like find shortest route by time) So if you are searching for trips by ID or something, you will need to store them as nodes.
On the other hand, an argument can be made for using relationships, because then you can take full advantage of APOC's weighted graph search functions.
A good way to decide if something should be a node or relation, is to ask yourself "are there any other relations that would make sense here?" If you are talking about if two cities are connected, a relationship makes more since because they either are or are not. If you are talking about road trips though, the trip can pass through multiple cities, can have participants in the trip (or groups there of) and can have an owner. In that case, for future flexibility, nodes will be much easier to maintain.
I would say it really depends on how you model these trips, lets assume we can generalize this as (city)-[trip]->(city). Notice that neo4j's relations always has a direction so we can go on adding an unlimited number of trips between cities without having to redefine each city for each trip -- this actually answers (a) by the way, we don't need to define where it starts and ends the relation does all that work for you.
'This means there are many of the same relations between nodes' <<- on this note, if you need to differ each trip based on the time the trip was taken you can add the date/timestamp in the relationship property or you can go with a time tree (see Mark Needham's Article on that here and Graphgrid's take)
Hope this helps.
Is it possible to "collapse" relationships in neo4j? I'm trying to graph relationships between people, and they can be related in multiple different ways - a shared course, jointly authored paper, RT or tweet mention. Right now I'm modeling people, courses, papers, and tweets all as nodes. But what I'm really interested in is modeling the person-person relationships that go through these intermediary nodes. Is it possible to graph the implicit relationship (person-course-person) explicit (person-person), while still keeping the course as a node? Something like this http://catalhoyuk.stanford.edu/network/teams/ - slide 2 and 3.
Any other data modeling suggestions welcome as well.
Yes, you can do it. The query
MATCH(a:Person)-->(:Course)<--(b:Person)
CREATE (a)-[:IMPLICIT_RELATIONSHIP]->(b)
will crate a relationship with type :IMPLICIT_RELATIONSHIP between all people that are related to the same course. But probably you don't need it since you can transverse from a to b and from b to a without this extra and not necessary relationship. Also if you want a virtual relationship at query time to use in a projection you can use the APOC procedure apoc.create.vRelationship.
The APOC procedures docs says:
Virtual Nodes and Relationships don’t exist in the graph, they are
only returned to the UI/user for representing a graph projection. They
can be visualized or processed otherwise. Please note that they have
negative id’s.
I know that Neo4j requires a relationship direction at creation time, but allows ignore this direction in query time. By this way I can query my graph ignoring the relationship direction.
I also know that there are some workarounds for cases when the relationships are naturally bidirectional or not directed, like described here.
My question is: Why is it implemented that way? Has a good reason to not allow not directed or bidirectional relationships at creation time? Is it a limitation of the database architecture?
The Cypher statements like below are not allowed:
CREATE ()-[:KNOWS]-()
CREATE ()<-[:KNOWS]->()
I searched the web for an answer, but I did not find much. For example, this github issue.
Is strange to have to define a relationship direction to one that don't have it. It seems to me that i'm hurting the semantic of my graph.
EDIT 1:
To clarify my standpoint about a "semantic problem" (maybe the term is wrong):
Suppose that I run this simple CREATE statement:
CREATE (a:Person {name:'a'})-[:KNOWS]->(b:Person {name:'b'})
As result i have this very simple graph:
The :KNOWS relationship has a direction only because Neo4j requires a relationship direction at creation time. In my domain a knows b and b knows a.
Now, a new team member will query my graph with this Cypher query:
MATCH path = (a:Person {name:'a'})-[:KNOWS]-(b:Person {name:'b'})
return path
This new team member don't know that when I created this graph I considered that :KNOWS relationship is not directed. The result that he will see is the same:
By the result this new team member can think that only Person a consider knows Person b. It seems to me bad. Not for you? This make any sense?
Fundamentally, it boils down to the internals of how the data is stored on disk in Neo4j -- note Chapter 6 of the O'Reilly Neo4j e-book.
In the data structure of a relationship they have a "firstNode" and a "secondNode", where each is either the left or the right hand side of the relationship.
To flag a relationship as uni/bi-directional would require an additional bit per node, where I would argue it is better to retain the direction in the data store and just ignore direction during querying.
In Neo4j relationships are always directed.
But if you don't care about the direction, you can ignore the direction when querying.
MATCH (p1:Person {name:"me"})-[:KNOWS]-(p2)
RETURN p2;
And with MERGE you can also leave off the direction when creating.
MATCH (p1:Person {name:"me"})
MATCH (p2:Person {name:"you"})
MERGE (p1)-[:KNOWS]-(p2);
You only need 2 relationships if they really convey a different meaning, e.g. :FOLLOWS on Twitter.
It seems to me that i'm hurting the semantic of my graph.
I can't see why a < or > symbol used during creation of a relationship hurts the semantics of your graph if you are going to not use that symbol during matching (and thus treating that relationship as undirected/bidirectional).
Suppose that the syntax proposed by you is supported. Now how will you connect with an undirected relationship two nodes a and b? You still have two options:
CREATE (a)-[:KNOWS]-(b)
CREATE (b)-[:KNOWS]-(a)
The pair (a, b) is always ordered by appearance even if not by semantics. So even if we remove the < or > symbol from the relationship declaration, the problem with the order of nodes in it cannot be eliminated. Therefore simply don't treat it is a problem.
For recursive breakdown structures, is it better to model as ...
a. Group HAS Subgroup... or
b. Subgroup PART_OF Group ?? ....
Some neo4j tutorials imply model both (the parent_of and child_of example) while the neo4j subtype tutorials imply that either will work fine (generally going with PART-OF).
Based on experience with neo4j, is there a practical reason for choosing one or the other or use both?
[UPDATED]
Representing the same logical relationship with a pair of relationships (having different types) in opposite directions is a very bad idea and a waste of time and resources. Neo4j can traverse a single relationship just as easily from either of its nodes.
With respect to which direction to pick (since we do not want both), see this answer to a related question.
I can't seem to find any discussion on this. I had been imagining a database that was schemaless and node based and heirarchical, and one day I decided it was too common sense to not exist, so I started searching around and neo4j is about 95% of what I imagined.
What I didn't imagine was the concept of relationships. I don't understand why they are necessary. They seem to add a ton of complexity to all topics centered around graph databases, but I don't quite understand what the benefit is. Relationships seem to be almost exactly like nodes, except more limited.
To explain what I'm thinking, I was imagining starting a company, so I create myself as my first nodes:
create (u:User { u.name:"mindreader"});
create (c:Company { c.name:"mindreader Corp"});
One day I get a customer, so I put his company into my db.
create (c:Company { c.name:"Customer Company"});
create (u:User { u.name:"Customer Employee1" });
create (u:User { u.name:"Customer Employee2"});
I decide to link users to their customers
match (u:User) where u.name =~ "Customer.*"
match (c:Company) where c.name =~ "Customer.*
create (u)-[:Employee]->(c);
match (u:User where name = "mindreader"
match (c:Company) where name =~ "mindreader.*"
create (u)-[:Employee]->(c);
Then I hire some people:
match (c:Company) where c.name =~ "mindreader.*"
create (u:User { name:"Employee1"})-[:Employee]->(c)
create (u:User { name:"Employee2"})-[:Employee]->(c);
One day hr says they need to know when I hired employees. Okay:
match (c:Company)<-[r:Employee]-(u:User)
where name =~ "mindreader.*" and u.name =~ "Employee.*"
set r.hiredate = '2013-01-01';
Then hr comes back and says hey, we need to know which person in the company recruited a new employee so that they can get a cash reward for it.
Well now what I need is for a relationship to point to a user but that isn't allowed (:Hired_By relationship between :Employee relationship and a User). We could have an extra relationship :Hired_By, but if the :Employee relationship is ever deleted, the hired_by will remain unless someone remembers to delete it.
What I could have done in neo4j was just have a
(u:User)-[:hiring_info]->(hire_info:HiringInfo)-[:hired_by]->(u:User)
In which case the relationships only confer minimal information, the name.
What I originally envisioned was that there would be nodes, and then each property of a node could be a datatype or it could be a pointer to another node. In my case, a user record would end up looking like:
User {
name: "Employee1"
hiring_info: {
hire_date: "2013-01-01"
hired_by: u:User # -> would point to a user
}
}
Essentially it is still a graph. Nodes point to each other. The name of the relationship is just a field in the origin node. To query it you would just go
match (u:User) where ... return u.name, u.hiring_info.hiring_date, u.hiring_info.hired_by.name
If you needed a one to many relationship of the same type, you would just have a collection of pointers to nodes. If you referenced a collection in return, you'd get essentially a join. If you delete hiring_info, it would delete the pointer. References to other nodes would not have to be a disorganized list at the toplevel of a node. Furthermore when I query each user I will know all of the info about a user without both querying for the user itself and also all of its relationships. I would know his name and the fact that he hired someone in the same query. From the database backend, I'm not sure much would change.
I see quite a few questions from people asking whether they should use nodes or relationships to model this or that, and occasionally people asking for a relationship between relationships. It feels like the XML problem where you are wondering if a pieces of information should be its own tag or just a property its parent tag.
The query engine goes to great pains to handle relationships, so there must be some huge advantage to having them, but I can't quite see it.
Different databases are for different things. You seem to be looking for a noSQL database.
This is an extremely wide topic area that you've reached into, so I'll give you the short of it. There's a spectrum of database schemas, each of which have different use cases.
NoSQL aka Non-relational Databases:
Every object is a single document. You can have references to other documents, but any additional traversal means you're making another query. Times when you don't have relationships between your data very often, and are usually just going to want to query once and have a large amount of flexibly-stored data as the document that is returnedNote: These are not "nodes". Node have a very specific definition and implies that there are edges.)
SQL aka Relational Databases:
This is table land, this is where foreign keys and one-to-many relationships come into play. Here you have strict schemas and very fast queries. This is honestly what you should use for your user example. Small amounts of data where the relationships between things are shallow (You don't have to follow a relationship more than 1-2 times to get to the relevant entry) are where these excel.
Graph Database:
Use this when relationships are key to what you're trying to do. The most common example of a graph is something like a social graph where you're connecting different users together and need to follow relationships for many steps. (Figure out if two people are connected within a depth for 4 for instance)
Relationships exist in graph databases because that is the entire concept of a graph database. It doesn't really fit your application, but to be fair you could just keep more in the node part of your database. In general the whole idea of a database is something that lets you query a LOT of data very quickly. Depending on the intrinsic structure of your data there are different ways that that makes sense. Hence the different kinds of databases.
In strongly connected graphs, Neo4j is 1000x faster on 1000x the data than a SQL database. NoSQL would probably never be able to perform in a strongly connected graph scenario.
Take a look at what we're building right now: http://vimeo.com/81206025
Update: In reaction to mindreader's comment, we added the related properties to the picture:
RDBM systems are tabular and put more information in the tables than the relationships. Graph databases put more information in relationships. In the end, you can accomplish much the same goals.
However, putting more information in relationships can make queries smaller and faster.
Here's an example:
Graph databases are also good at storing human-readable knowledge representations, being edge (relationship) centric. RDF takes it one step further were all information is stored as edges rather than nodes. This is ideal for working with predicate logic, propositional calculus, and triples.
Maybe the right answer is an object database.
Objectivity/DB, which now supports a full suite of graph database capabilities, allows you to design complex schema with one-to-one, one-to-many, many-to-one, and many-to-many reference attributes. It has the semantics to view objects as graph nodes and edges. An edge can be just the reference attribute from one node to another or an edge can exist as an edge object that sits between two nodes.
An edge object can have any number of attribute and can have references off to other objects, as shown in the diagram below.
Being able to "hang" complex objects off of an edge allows Objectivity/DB to support weighted queries where the edge-weight can be calculated using a user-defined weight calculator operator. The weight calculator operator can build the weight from a static attribute on the edge or build the weight by digging down through the objects connected to the edge. In the picture, above, we could create a edge-weight calculator that computes the sum of the CallDetail lengths connected to the Call edge.