Neo4j persistent named graph - neo4j

I'm coming from the RDF world where named graphs are persistent and can be used like a collection of triples. Moreover you can query against one single named graph or over the whole triplestore. I'm looking for the same features (or a workaround to achive them) in Neo4j.
Neo4j's Graph Catalog is well documented. As I understood, named graphs in Neo4j are stored entirely in-memory (so lost after a restart) with a subset of nodes you define for analytic purpose.
Is there a way to create persistents named graphs in Neo4j?
A graph that is stored in the disk with the data and that permits to fast access to a subset of nodes (nodes can be added or removed from the named graph).

You could give every node in the same "named graph" the same label. Since a node can have multiple labels, this does not prevent you from using other labels for other purposes as well.

Related

Graph database two nodes have a relationship with another node ONLY if both are true

like this: https://i.imgur.com/MrA6zQP.png
A and B are related to C but ONLY if both A and B are true.
I'm currently using Neo4J as my graph database, but I'm not sure it has this capability. I'd be open to switching to a different graph database if it meant that the free version had this capability.
In Neo4j (and any other graph database I guess) a relation exists or does not exist. As long as we're not using quantum computing, it's binary.
But, you can definitely retrieve paths, or create/project virtual graphs based on conditions, which could include the one you mention.

Queries on 200 GB graph

I am in need to use a scalable solution to create a Geohash connected graph.
I find Cypher for APache Spark a project that let use cypher on spark dataframes to create a graph, however it can only create immutable graphs by mapping the different data-frames,so i didn't get the graph that i need.
I can get the graph that i need if i run some other cypher queries on a Neo4j Browser, however my stored graph is about 200 GB.
So i'm asking if that logic and fast to run queries on 200 GB of graph data using Neo4j browser and apoc functions ?
If you're asking if Neo4j can handle databases of this size, then the answer is yes. But you'll see different results depending on how your data is modeled and the kind of queries you want to run.
Performance correlates not necessarily with the size of the graph, but on the portion of the graph touched and traversed by your queries. Graph-wide analytical queries must touch the entire graph, while tightly defined queries that touch a smaller local part of the graph will be quite quick.
Anything you can do in your queries to constrain the portion of the graph you have to traverse or filter will help out your query speed, so good modeling and usage of indexes and constraints is key.

database solution for multiple isolated graphs

I have an interesting problem that I don't know how to solve.
I have collected a large dataset of 80 million graphs (they are CFG as in Control Flow Graph produced by programs I have analysed from Github) which I need to be able to search efficiently.
I looked into existing solutions like Neo4j but they are all designed to store a global single graph.
In my case this is the opposite all graphs are independent -like rows in a table - but I need to search through all of them efficiently.
For example I want to find all CFGs that has a particular IF condition or a WHILE loop with a particular condition.
What's the best database for this use case?
I don't think that there's a reason not to simply store all those graphs in a single graph, whether it's Neo4j or a different graph database. It's not a problem to have many disparate graphs in a single graph where the disparate graphs are disconnected from one another.
As for searching them efficiently, you would either (1) identify properties in your CFGs that you want to search on and convert them to some indexed value of the graph or (2) introduce some graph structure (additional vertices/edges) between the CFGs that will allow you to do the searches you want via graph traversal.
Depending on what you need to search on approach 1 may not be flexible enough for you especially, if what you intend to search on is not completely known at the time of loading the data. Also, it is important to note that with approach 2 you do not really lose the fact that you have 80 million distinct graphs just because you provided some connection between them. Those physical connections don't change that basic logical fact. You just need to consider those additional connections when you write traversals that you expect to occur only within a single CFG.
I'm not sure what Neo4j supports in this area, but with Apache TinkerPop (an open source graph processing framework that lets you write vendor agnostic code over different graph databases, including Neo4j), you might consider doing some form of graph partitioning to help with approach 2. Or you might subgraph() the larger graph to only contain the CFG and then operate with that purely in memory when querying. Both of these approaches will help you to blind your query to just the individual CFG you want to traverse.
Ultimately, however, I see this issue as a modelling problem. You will just need to make some choices on how to best establish the schema for your use case and virtually any graph database should be able to support that.

Time Based Graph Data Modeling

I have a data modeling question. The data that I have is basically nodes with relations to other nodes. Nodes have properties. Edges are directional and have properties. I am exploring if a Graph DB like Neo4j will be appropriate or not.
The doubt is because: The data that I have is time based. It changes on the basis of time, and I need to keep track of the historical data as well. For example, I should be able to query:
What was the graph like on a particular date?
Who all did a given node depend on at a particular time?
What were the properties of the edge between two given nodes at a particular time?
I searched but couldn't find a satisfactory resource where I could understand how time can be factored into a Graph DB. Do you think my requirement can be inherently met using a Graph DB? Is there an example/resource/article which describes this for Neo4j or any other graph db?
I want to make sure that the database is scalable to about 100K nodes, and millions of edges. I am optimizing for time over space.
Is there an example/resource/article which describes this for Neo4j or
any other graph db?
Here is an excellent article from Ian Robinson blog about time-based versioned graphs.
Basically the article describes a way to represent a time-based versioned graphs adding some extra nodes and timestamped relationships to represent the state of the graph in a given timestamp.
The following image from the referenced article shows:
The price of produc_id : 1 has changed from 1.00 to 2.00. This is a state change.
The product_id : 1 is now sold by shop_id : 2 (and not by shop_id : 1). This is a structural change.
Do you think my requirement can be inherently met using a Graph DB?
Yes, but not in an easy or "natural" way. Versioning a time based model with a database that don't offers this functionality natively can be hard and expensive. From the article:
Neo4j doesn’t provide intrinsic support either at the level of its
labelled property graph model or in its Cypher query language for
versioning. Therefore, to version a graph we need to make our
application graph data model and queries version aware.
and
versioning necessarily creates a lot more data – both more nodes and
more relationships. In addition, queries will tend to be more complex,
and slower, because every MATCH must take account of one or more
versioned elements. Given these overheads, apply versioning with care.
Perhaps not all of your graph needs to be versioned. If that’s the
case, version only those portions of the graph that require it.
EDIT:
A few words from the book Graph Databases (by Ian Robinson, Jim Webber and Emil Eifrem) about versioning in graph databases. This book is available for download at Neo4J page:
Versioning:
A versioned graph enables us to recover the state of the
graph at a particular point in time. Most graph databases don’t
support versioning as a first-class concept. It is possible, however,
to create a versioning scheme inside the graph model. With this scheme
nodes and relationships are timestamped and archived whenever they are
modified The downside of such versioning schemes is that they leak
into any queries written against the graph, adding a layer of
complexity to even the simplest query.
This paragraph links the article indicated in the beginning of this answer.

Preserving nodes and relationships history in a graph database

I am trying to implement a solution using Graph DB with nodes and relationships. There is a requirement where a user may want to run the reports (queries) on the historical data for a node, or check out the historical relationships.
Does Graph DBs support this functionality out of the box? or, if some alternate mechanism can be implemented to persist the historical audit logging enabled for the node/relation changes in the graph DB?
Some ideas which we can contemplate...?
You can use transaction event listeners to create historic copies of nodes and relationships as they are updates.
If you only have tree structures in your graph I recommend that you look at Persistent Data Structures with sparse copying and structural sharing.
For Neo4j there is an Github example project with versioning.

Resources