Is there a query for a Neo4J graph that could traverse said graph and find nodes based on mutual relationships? For example, if Node A is related to Node B (bidirectionally), B is related to C, C is related to D, D is related to A, A is related to C, and B is related to D, such that there is a subgraph in which every node is connected to every other node, is there an efficient way to return that subgraph or group of Nodes?
I realize my explanation is poor, so I provide an example graph in the console: http://console.neo4j.org/r/qb2xmp
Here, I have created a graph, and I would like to return groups that are mutually related of 3 or more - so, in this case, I would ideally like to be returning the group of Scott, Josh, Frank, and Ben, as well as the group of Frank, Ben, and Eric. If possible, I would like to be able to identify who composes those individual groups.
This is an instance of the Clique Problem and is NP-Complete.
Here is a related question on SO with a good explanation!
Sorry I hit enter too soon. So there is no "efficient" way to do this. It is not unattemptable though in certain cases, and you will have the best luck looking for an algorithm that solves this general problem and implementing it in Neo4J.
did you find any solutions for this?
I implemented something like this on my project for text network visualization but it launches Gephi Toolkit (on Java) to perform some metric calculations on the graph, detect communities, etc. But that's too heavy...
You might be interested to look into Gephi's algorithms though, especially the Force Atlas layout implemented in Sigma.Js and especially the modularity algorithm used in Gephi itself. This might give you some clues as to how to proceed...
Related
I'm working on a scientific database that contains model statements such as:
"A possible cause of Fibromyalgia is Microglial hyperactivity, as supported by these 10 studies: [...] and contradicted by 1 study [...]."
I need to specify a source for statements in Neo4j and be able to do 2 ways operations, like:
Find all statements supported by a study
Find all studies supporting a statement
The most immediate idea I had is to use the DOI of studies as unique identifiers in the relationship property. The big con of this idea is that I have to scan all the relationships to find the list of all statements supported by a study.
So, since it is impossible to make a link between a study and a relationship, I had the idea to make 2 links, at each extremity of the relationship. The obvious con is that it does not give information about the relationship, like "support" or "contradict".
So, I came to the conclusion that I need a node for the hypothesis:
However, it overloads the graph and we are not anymore in the classical node -relationship-> node design that makes property graphs so easy to understand.
Using RDF, it is possible to add properties to relationships using subgraphs, however there we enter semantic graphs and quad stores, which is a more complex tool.
So I'm wondering if there is a "correct" design pattern for Neo4j to support this type of need that I may not have imagined instead?
Thanks
Based on your requirements, I think put support_study as property of edge will do the work:
Thus we could query the following as:
Find all statements supported by a study
MATCH ()-[e:has_cause{support_study: "doi_foo_bar"}]->()
RETURN e;
Find all studies supporting a statement
Given statement is “foo” is caused by “bar”
MATCH (v:disease{name: "foo"})-[e:has_cause]->(v1:sympton{name: "bar")
RETURN DISTINCT e.support_study;
While, this is mostly based on NebulaGraph, where:
It speaks cypher DQL(together with nGQL)
It supports properties in edge
It used 4-tuple(rather than a Key) to distingush an edge(src,dst,edge_type,rank), where rank is an unique design to enable multiple has_cause edge instance between one pair of disease-> sympton, you could put the hash of doi or other number as rank field(or omit, of cause, it will be 0)
It’s distributed and Open-Source(Apache 2.0)
Note:
In NebulaGraph, index should be created on has_cause(support_study) and disease(name), ref: https://www.siwei.io/en/nebula-index-explained/ and https://docs.nebula-graph.io/3.2.0/3.ngql-guide/14.native-index-statements/
But, I think it applies to neo4j, too :)
Is it possible to "collapse" relationships in neo4j? I'm trying to graph relationships between people, and they can be related in multiple different ways - a shared course, jointly authored paper, RT or tweet mention. Right now I'm modeling people, courses, papers, and tweets all as nodes. But what I'm really interested in is modeling the person-person relationships that go through these intermediary nodes. Is it possible to graph the implicit relationship (person-course-person) explicit (person-person), while still keeping the course as a node? Something like this http://catalhoyuk.stanford.edu/network/teams/ - slide 2 and 3.
Any other data modeling suggestions welcome as well.
Yes, you can do it. The query
MATCH(a:Person)-->(:Course)<--(b:Person)
CREATE (a)-[:IMPLICIT_RELATIONSHIP]->(b)
will crate a relationship with type :IMPLICIT_RELATIONSHIP between all people that are related to the same course. But probably you don't need it since you can transverse from a to b and from b to a without this extra and not necessary relationship. Also if you want a virtual relationship at query time to use in a projection you can use the APOC procedure apoc.create.vRelationship.
The APOC procedures docs says:
Virtual Nodes and Relationships don’t exist in the graph, they are
only returned to the UI/user for representing a graph projection. They
can be visualized or processed otherwise. Please note that they have
negative id’s.
For recursive breakdown structures, is it better to model as ...
a. Group HAS Subgroup... or
b. Subgroup PART_OF Group ?? ....
Some neo4j tutorials imply model both (the parent_of and child_of example) while the neo4j subtype tutorials imply that either will work fine (generally going with PART-OF).
Based on experience with neo4j, is there a practical reason for choosing one or the other or use both?
[UPDATED]
Representing the same logical relationship with a pair of relationships (having different types) in opposite directions is a very bad idea and a waste of time and resources. Neo4j can traverse a single relationship just as easily from either of its nodes.
With respect to which direction to pick (since we do not want both), see this answer to a related question.
I've got my graph database, populated with nodes, relationships, properties etc. I'd like to see an overview of how the whole database is connected, each relationship to each node, properties of a node etc.
I don't mean view each individual node, but rather something like an ERD from a relational database, something like this, with the node labels. Is this possible?
You can use the metadata by running the command call db.schema().
In Neo4j v4 call db.schema() is deprecated, you can now use call db.schema.visualization()
As far as I know, there is no straight-forward way to get a nicely pictured diagram of a neo4j database structure.
There is a pre-defined query in the neo4j browser which finds all node types and their relationships. However, it traverses the complete graph and may fail due to memory errors if you have to much data.
Also, there is neoprofiler. It's a tool which claims to so what you ask. I never tried and it didn't get too many updates lately. Still worth a try: https://github.com/moxious/neoprofiler
Even though this is not a graphical representation, this query will give you an idea on what type of nodes are connected to other nodes with what type of relationship.
MATCH (n)
OPTIONAL MATCH (n)-[r]->(x)
WITH DISTINCT {l1: labels(n), r: type(r), l2: labels(x)}
AS `first degree connection`
RETURN `first degree connection`;
You could use this query to then unwind the labels to write that next cypher query dynamically (via a scripting language and using the REST API) and then paste that query back into the neo4j browser to get an example set of the data.
But this should be good enough to get an overview of your graph. Expand from here.
I'm trying to query Book nodes for recommendation by Cypher.
I want to recommend A:Book and C:Book for A:User.
i'm sorry I need some graph to explain this question, but I could't up graph image because my lepletion lacks for upload function.
I wrote query below.
match (u1:User{uid:'1003'})-->(o1:Order)-->(b1:Book)<--(o2:Order)
<--(u2:User)-->(o3:Order)-->(b2:Book)
return b2
This query return all Books(A,B,C,D) dispite cypher's Uniqueness.
I expect to only return A:Book and C:Book.
Is this behavior Neo4j' specification?
How do I get expected return? Thanks, everyone.
environment:
Neo4j ver.v2.0.0-RC1
Using Neo4j Server with REST API
Without the sample graph its hard to say why you get something back when you expected something else. You can share a sample graph by including a create statement that would generate said graph, or by creating it in Neo4j console and putting the link in your question. Here is an example of the latter: console.neo4j.org/r/fnnz6b
In the meantime, you probably want to declare the type of the relationships in your pattern. If a :User has more than one type of outgoing relationships you will be excluding those other paths based on the labels of the nodes on the other end, which is much less efficient than to only traverse the right relationships to begin with.
To my mind its not clear whether (u:User)-->(o:Order)-->(b:Book) means that a user has one or more orders, and each order consists of one or more books; or if it means only that a user ordered a book. If you can share a sample, hopefully that will be clear too.
Edit:
Great, so looking at the graph: You get B and D back because others who bought B also bought D, and others who bought D also bought B, which is your criterion for recommendation. You can add a filter in the WHERE clause to exclude those books that the user has already bought, something like
WHERE NOT (u1)-[:BUY]->()-[:CONTAINS]->(b2)
This will give you A, C, C back, since there are two matching paths to C. It's probably not important to get two result items for C, so you can either limit the return to give only distinct values
RETURN DISTINCT(b2)
or group the return values by counting the matching paths for each result as a 'recommendation score'
RETURN b2, COUNT(b2) as score
Also, if each order only [CONTAINS] one book, you could try modelling without order, just (:User)-[:BOUGHT]->(:Book).