I'm currently working on a project where we want to store sets of graphs in a graph database. For this purpose im using Neo4J and I can imagine two different solutions.
Put the nodes as the nodes into the database and the edges as the edges. Every edge and every node has a property graph_id, which indicates for nodes and edges to which graph they belong (trivial).
create three different labels of Nodes. The first label is Node, the second Edge and the third Graph. Now every node can be stored as Node, every edge as edge-Node and every edge and node belonging to a graph is connected to a graph node. ( the Neo4J documentation suggests to create labels for the different kinds of elements in the database )
Which solution would you prefer?
I think you are misreading the docs. They mean that different kinds of nodes should have different labels and different kinds of edges should have different labels. If you make everything, including edges into "nodes" then you will not be able to apply the library's implementations of graph theory algorithms.
Option 1 is the way to go.
Related
Initial situation
I have several independent and disconnected graphs, each of them have a hierarchical like structure with a local root element. Each of these graphs consists of approximately 8 million nodes and 40 million relationships. I have successfully created a three-digit number of Cypher queries, which should now be applied to a single graph only and not the entirety of all graphs. The graph, the queries have to apply to, is specified by its root node.
Challenge to be solved
How can I realize a kind of pseudo multi-client capability for a graph, if all graphs have to remain in a common Neo4j database for reasons of reporting and pattern matching?
approach to the problem / preliminary result
Implement a single shortest path to the given root element for selection purposes in really every query at the beginning? Cons:
huge performance losses expected
with high development costs
Expand each graph with a separate, additional label? Cons:
complex queries, high development effort
For these cases, adding a specific label per tenant/client to all nodes in the subgraph tends to be the approach taken. It would require you to ensure that when you match to the relevant nodes in the query that you additionally make sure the nodes you're working with have the client's label present.
As a note for the future, native multi-tenancy support is one of the key features we're working on for the next year.
I have to store biological interactions in a Neo4j database. For example, consider a scenario where I have two types of nodes, Protein & Experiment and a relationship INTERACTS_WITH. The relationship exists as (Protein)-[INTERACTS_WITH]-(Protein). Now, the INTERACTS_WITH also relates to Experiment because this biological interaction was observed in that experiment.
I need to relate the INTERACTS_WITH relationship to the Experiments.
One way to achieve this can be to store the ID of all such Experiments in an array type property of the INTERACTS_WITH relationship. But that will be just like storing the Primary Key of an entity as Foreign Key of another entity in the relational database, which I want to avoid.
Another way can be to create an Interaction node for each pair of interacting genes and then relate it to the two Proteins and the Experiments. But an interaction is possible between two Protein nodes only, so I will have to programmatically put a constraint on the number of Protein nodes that relate to an Interaction node. This approach is also not good because INTERACTS_WITH is actually a relationship and perhaps it will be not a good idea to model it as a node.
Is there a better, graphical way to do this? If not, which of the above two approaches will be better?
Another way can be to create an Interaction node for each pair of
interacting genes and then relate it to the two Proteins and the
Experiments.
I believe that it is a very good approach to solve your problem.
But an interaction is possible between two Protein nodes only, so I
will have to programmatically put a constraint on the number of
Protein nodes that relate to an Interaction node.
There is nothing to do. Programmers do it all the time! For example: What guarantees do you have about how many INTERACTS_WITH relationships exists between a pair of Protein nodes? Probably you take care about it at creation time.
This approach is also not good because INTERACTS_WITH is actually a
relationship and perhaps it will be not a good idea to model it as a
node.
Think about it: if your INTERACTS_WITH relationship needs to be related with more than two nodes maybe you are modeling a node as a relationship, right?
Tip: take a look in the section Graph modeling – best practices and
pitfalls of the book Learning Neo4j (by Rik Van Bruggen) and in the section Common Modeling Pitfalls of the book Graph Databases (by Ian Robinson, Jim Webber & Emil Eifrem). This can be enlightening. You can download the two books in the Neo4j site here.
I'm working on a visualization of organizational structure in Gephi. I have a graph of individuals, connected by whether or not they have worked together in the past. Graphing individuals looks good, but I would like to combine nodes (individuals) based on a categorical attribute (department; string). The new graph -- or at least a visualization -- would have a node for every department, preferably with a numerical weight proportional to how many individuals comprise it.
I could do this in the scripts that generate the graph files before importing. But I did exactly this about a year ago entirely in Gephi. Either the functionality was removed (like the pie charts!) or I've just forgotten (more likely).
Am using Gephi 0.9.1. Any help, much appreciated.
I have a directed graph where some two nodes are connected by multiple edges.
However I would like the two connected nodes to be visualised with only one edge with a property specifying the number of edges between them and possibly a relative edge thickness.
What query do I have to use to achieve this?
If you are referring to the neo4j browser web ui, there is no command to customize the visualization in that way.
In particular, the Cypher query language is only for performing DB operations, and does not have a way to directly affect visualizations.
I'm new to neo4j and would like to know if it's possible to directly link a node from one graph to one or more nodes on another graph.
I have one core graph with thousands of other graphs. Each core node may link to other graphs, and nodes on that graph may link to other graphs or nodes on other graphs, including nodes on the core graph.
I know I can put all the nodes into one graph, but I would prefer to do it as described above.
Thanks!
Rein
You have only one graph in a single neo4j instance. You can store your "core graph" and all other graphs as one large unconnected network.