need some clarification about DAG (Directed Acyclic Graph) - graph-algorithm

I know the definition of a DAG, which is a directed graph without any cycle. My question is: Can I consider 2 separate DAGs as one DAG? If not, what is the technical name for a set of DAGs?

A DAG can have disconnected parts, since the only requirements are being a directed, acyclic graph.
If you want to specify that it is connected, you could say "connected DAG".

Related

Neo4j persistent named graph

I'm coming from the RDF world where named graphs are persistent and can be used like a collection of triples. Moreover you can query against one single named graph or over the whole triplestore. I'm looking for the same features (or a workaround to achive them) in Neo4j.
Neo4j's Graph Catalog is well documented. As I understood, named graphs in Neo4j are stored entirely in-memory (so lost after a restart) with a subset of nodes you define for analytic purpose.
Is there a way to create persistents named graphs in Neo4j?
A graph that is stored in the disk with the data and that permits to fast access to a subset of nodes (nodes can be added or removed from the named graph).
You could give every node in the same "named graph" the same label. Since a node can have multiple labels, this does not prevent you from using other labels for other purposes as well.

Neo4j multiple graphs on one server/cluster

Note: This is similar to the question asked here: Storing multiple graphs in Neo4J - but this question was asked in 2011 and I haven't found a direct answer for Neo4j 3.
Is there a way to configure the properties for Neo4j so that there are multiple graphs available to my application, using a single Neo4j instance?
My use case is that I would like to operate on multiple similar but discrete graphs. The graph I choose to operate on would be decided on the fly and dynamically during the execution of my application. If I try to choose a graph that doesn't exist yet, it should be created. At points in the future, I may decide to delete one of the graphs, and the deletion would happen independently of the other graphs.
A follow up question is: in the linked post from 2011, an answer mentioned using subgraphs. Is a subgraph set up through neo4j properties, and do they contain each their own set of indexes and property elements?
Assuming your graphs are not sharing nodes.
You may try to have a specific label for each graph,like g1, g2 .... and add that label to each node.

How to use Neo4J for temporary graph calculations?

I'm completely new to Neo4J and I am struggling with a design/architecture question.
Setup
I have a given Graph with different nodes. That could be the a company graph with customer, products, projects, sales and so on (like in the movie example https://neo4j.com/developer/get-started/). This graph can change from time to time.
In my use case I would like to take this graph, adapt it and test some scenarios. E. g. I would add a new product, define a new sales person with responsibilities or increase the price of a product. To the extended graph I will "ask questions" or in other words, I would use graph algorithms to extract information. The changes I made, shouldn't affect the original graph.
Requirements
I do not wanna write my changes to the original graph, because every time the original graph should be the base for the analysis. Also for the reason that changing and analysing the graph can happen concurrently from different users.
I still wanna use the power of Cypher to make the analysis, so having the graph only in the memory wouldn't do it.
Problem
On the one hand I do not wanna change the original graph, on the other I wanna add and change information temporarily for a specific user. Using a relational DB I would just point with an ID to the "static" part of the data or I would do the calculation in Code instead of SQL.
Questions
Any best practices for that?
Can I use Cypher directly in code (none-persistent, directly on the data in the memoty)?
Should I make a copy of the Graph, whenever I use it (not really,
right?)?
Is there a concept to link user specific data to a static graph?
I am happy about all ideas, concepts and tricks! It's more about graph databases in general....Neo4J was so far my first choice.
Cheers
Chris
What about using feature flags in your graph by using different relationship types ?
For example, let's say you have a User that likes 10 movies in your original graph.
(user)-[:LIKES]->(movies)
Then for your experiments, you can have
(user)-[:LIKES_EXPERIMENT]->(othermovies)
This offers you the possibility to traverse the graph in the original way without loosing performance by just enforcing the relationship types. On the other hand it also offers you the possibility to use only the experiments or combining original data with experiments by specifying both relationship types in your traversals.
The same goes for properties, you could prefix properties with experiment_ for eg. And finally you could also play with different labels. There are tons of possibilities before having to use different graph data stores.
Another possibility is to use some kind of versioning like described here :
http://iansrobinson.com/2014/05/13/time-based-versioned-graphs/
But without the time factor.
There is also a nice plugin for it https://github.com/h-omer/neo4j-versioner-core
My suggestion is:
Copy the data folder of the original database to a new location: sudo cp /path/to/original/data/folder ~/neo4j
Run a Docker container mapping the copy of data folder as the container data folder.
Something like this:
docker run \
--publish=7474:7474 --publish=7687:7687 \
--volume=$HOME/neo4j/data:/data \
neo4j
You can specify another ports if :7474 and :7686 are being used.
Work over this copy.
You can transform these instructions in a .sh file to automate the process.

Time Based Graph Data Modeling

I have a data modeling question. The data that I have is basically nodes with relations to other nodes. Nodes have properties. Edges are directional and have properties. I am exploring if a Graph DB like Neo4j will be appropriate or not.
The doubt is because: The data that I have is time based. It changes on the basis of time, and I need to keep track of the historical data as well. For example, I should be able to query:
What was the graph like on a particular date?
Who all did a given node depend on at a particular time?
What were the properties of the edge between two given nodes at a particular time?
I searched but couldn't find a satisfactory resource where I could understand how time can be factored into a Graph DB. Do you think my requirement can be inherently met using a Graph DB? Is there an example/resource/article which describes this for Neo4j or any other graph db?
I want to make sure that the database is scalable to about 100K nodes, and millions of edges. I am optimizing for time over space.
Is there an example/resource/article which describes this for Neo4j or
any other graph db?
Here is an excellent article from Ian Robinson blog about time-based versioned graphs.
Basically the article describes a way to represent a time-based versioned graphs adding some extra nodes and timestamped relationships to represent the state of the graph in a given timestamp.
The following image from the referenced article shows:
The price of produc_id : 1 has changed from 1.00 to 2.00. This is a state change.
The product_id : 1 is now sold by shop_id : 2 (and not by shop_id : 1). This is a structural change.
Do you think my requirement can be inherently met using a Graph DB?
Yes, but not in an easy or "natural" way. Versioning a time based model with a database that don't offers this functionality natively can be hard and expensive. From the article:
Neo4j doesn’t provide intrinsic support either at the level of its
labelled property graph model or in its Cypher query language for
versioning. Therefore, to version a graph we need to make our
application graph data model and queries version aware.
and
versioning necessarily creates a lot more data – both more nodes and
more relationships. In addition, queries will tend to be more complex,
and slower, because every MATCH must take account of one or more
versioned elements. Given these overheads, apply versioning with care.
Perhaps not all of your graph needs to be versioned. If that’s the
case, version only those portions of the graph that require it.
EDIT:
A few words from the book Graph Databases (by Ian Robinson, Jim Webber and Emil Eifrem) about versioning in graph databases. This book is available for download at Neo4J page:
Versioning:
A versioned graph enables us to recover the state of the
graph at a particular point in time. Most graph databases don’t
support versioning as a first-class concept. It is possible, however,
to create a versioning scheme inside the graph model. With this scheme
nodes and relationships are timestamped and archived whenever they are
modified The downside of such versioning schemes is that they leak
into any queries written against the graph, adding a layer of
complexity to even the simplest query.
This paragraph links the article indicated in the beginning of this answer.

Can this be accomplished by a Graph Database?

I have a request to develop an application that keep track of the movements of a certain item (or items). To better demonstrate what the application must do, I drew a diagram (simplified abstraction).
As I never worked with any databases other than the relational ones, I really don't know if I can solve this problem with a graph database.
These questions must be answered by the system:
What was the path that a certain pen drive walked?
I passed some pen drivers. Where are they now?
What are the pens I received, from where did they come from and to where did they go?
Where are the pens I burned and passed? And with whom?
Any help and suggestions are much appreciated.
Thanks
In Neo4j everything is either a node or a relationship. So it's useful to think: what would be my nodes and relationships?
Here it might be, for example, that every "pen drive, "person" and "location" is a node. Verbs like "walk" or "give" would be your relationships.
In this model, you'd be able use "Cypher" to query for things like "give me all location nodes connected to pen nodes by the relationship walk." Or, say "start at all person nodes and return nodes who have a give relationship to a pen drive node that doesn't have a give relationship that connects back to the starting person node."
This rich graph query language gives you nice algorithms like shortest distance for free, so you beyond a transactional record you could determine whether, for example, a pen drive made it from A to B using the optimal path. But as you can see above, "relational joins" do not beget simple queries or descriptions thereof.
When it comes to database design, when the model becomes cumbersome to map mentally, it's going to be a pain to develop too. Design your database based on how you plan to query it. If you're unable to easily explain those queries in terms of Neo4j, it's possible that Neo4j isn't going to be the best fit.

Resources