Preserving nodes and relationships history in a graph database

Preserving nodes and relationships history in a graph database - neo4j

I am trying to implement a solution using Graph DB with nodes and relationships. There is a requirement where a user may want to run the reports (queries) on the historical data for a node, or check out the historical relationships.
Does Graph DBs support this functionality out of the box? or, if some alternate mechanism can be implemented to persist the historical audit logging enabled for the node/relation changes in the graph DB?
Some ideas which we can contemplate...?

You can use transaction event listeners to create historic copies of nodes and relationships as they are updates.
If you only have tree structures in your graph I recommend that you look at Persistent Data Structures with sparse copying and structural sharing.
For Neo4j there is an Github example project with versioning.

Related

Graph database two nodes have a relationship with another node ONLY if both are true

like this: https://i.imgur.com/MrA6zQP.png
A and B are related to C but ONLY if both A and B are true.
I'm currently using Neo4J as my graph database, but I'm not sure it has this capability. I'd be open to switching to a different graph database if it meant that the free version had this capability.

In Neo4j (and any other graph database I guess) a relation exists or does not exist. As long as we're not using quantum computing, it's binary.
But, you can definitely retrieve paths, or create/project virtual graphs based on conditions, which could include the one you mention.

How to create a graph visualisation of a relational database in Neo4j?

I have some normalized Master Data in PostgreSQL.
I want a graph visualization layer in Neo4j without migrating any Data to Neo4j. Kind of like a view. Lazy fetching of data at runtime.
Neo4j will not commit any changes and only meant for viewing.
Can Neo4j use something like a PostgreSQL JDBC connector and provide a visualization?
Thanks.

You could with apoc.load.jdbc and virtual nodes/relationships created from the data.
But it would be a bit involved as you need to load all tables and then connect them.
With the Neo4j-ETL tool you can do a quick (few min) one-time import to visualize.
https://neo4j.com/blog/neo4j-etl-1-2-0-release-whats-new-and-demo/
Esp. if you don't just visualize but also query you need to transfer the data anyway.

You can use ETL tool, from Neo4j.
You need to ask for an activation key via email at devrel#neo4j.com

Time Based Graph Data Modeling

I have a data modeling question. The data that I have is basically nodes with relations to other nodes. Nodes have properties. Edges are directional and have properties. I am exploring if a Graph DB like Neo4j will be appropriate or not.
The doubt is because: The data that I have is time based. It changes on the basis of time, and I need to keep track of the historical data as well. For example, I should be able to query:
What was the graph like on a particular date?
Who all did a given node depend on at a particular time?
What were the properties of the edge between two given nodes at a particular time?
I searched but couldn't find a satisfactory resource where I could understand how time can be factored into a Graph DB. Do you think my requirement can be inherently met using a Graph DB? Is there an example/resource/article which describes this for Neo4j or any other graph db?
I want to make sure that the database is scalable to about 100K nodes, and millions of edges. I am optimizing for time over space.

Is there an example/resource/article which describes this for Neo4j or
any other graph db?
Here is an excellent article from Ian Robinson blog about time-based versioned graphs.
Basically the article describes a way to represent a time-based versioned graphs adding some extra nodes and timestamped relationships to represent the state of the graph in a given timestamp.
The following image from the referenced article shows:
The price of produc_id : 1 has changed from 1.00 to 2.00. This is a state change.
The product_id : 1 is now sold by shop_id : 2 (and not by shop_id : 1). This is a structural change.
Do you think my requirement can be inherently met using a Graph DB?
Yes, but not in an easy or "natural" way. Versioning a time based model with a database that don't offers this functionality natively can be hard and expensive. From the article:
Neo4j doesn’t provide intrinsic support either at the level of its
labelled property graph model or in its Cypher query language for
versioning. Therefore, to version a graph we need to make our
application graph data model and queries version aware.
and
versioning necessarily creates a lot more data – both more nodes and
more relationships. In addition, queries will tend to be more complex,
and slower, because every MATCH must take account of one or more
versioned elements. Given these overheads, apply versioning with care.
Perhaps not all of your graph needs to be versioned. If that’s the
case, version only those portions of the graph that require it.
EDIT:
A few words from the book Graph Databases (by Ian Robinson, Jim Webber and Emil Eifrem) about versioning in graph databases. This book is available for download at Neo4J page:
Versioning:
A versioned graph enables us to recover the state of the
graph at a particular point in time. Most graph databases don’t
support versioning as a first-class concept. It is possible, however,
to create a versioning scheme inside the graph model. With this scheme
nodes and relationships are timestamped and archived whenever they are
modified The downside of such versioning schemes is that they leak
into any queries written against the graph, adding a layer of
complexity to even the simplest query.
This paragraph links the article indicated in the beginning of this answer.

Graph database referencing cassandra tables

I have a scenario where I would like to model my IoT asset in a the graph database of DataStax Enterprise. This is a perfect fit for my hierarchical data structure. However, when it comes to my time series data I already have that stored in a separate Cassandra table. Is there a way to bridge the gap between data in the graph database and data in a standard cassandra table?
Thanks

At this current moment, all data needs to reside in DSE Graph tables to be available via Gremlin traversals for OLTP or OLAP use cases. We have features coming out soon though that will help provide an OLAP scenario. We'd love to learn more about your use case though to enhance the product for this type of scenario. If you'd like, please join the DataStax Academy Graph channel and we can discuss this requirement further - https://academy.datastax.com/slack

BigData Vs Neo4J

I´ve been looking for a triple store for my project. In this project i want to store my data according to certain ontologies (OWL).
From my research i ended up with two tecnologies Neo4J and BigData that seems to fit well in this case.
I want to know if any of this two is more apropriated to use with RDF, RDFS, OWL and SPARQL Queries.

Neo4j can be used to store as entity-relationship-entity form. In case of Bigdata, you should not be upload your whole data into Neo4j because it will become very heavy and process will be very much slow. You should use complimentary db for storing actual data and store ids and some params into Neo4j for Graph traversal to perform sort of Graph Analytics. Neo4j is mainly build up for Graph Analytics that its power or you have to use Graph engine e.g GraphX (Spark).
Thanks,

You might want to try out the SparQL plugin for Neo4j, see here for a HTTP based test, and this Berlin Dataset Test for embedded usage.

Neo4J is a specific technology, while big data is more a generic term. I think what you're asking about OLAP and OLTP. As data gets bigger, there are differences between use cases for RDF style graph databases, which are often used for OLAP (On-line Analytical Processing) style analytics. In short, OLAP is designed for analytics that look across an big data set, while OLTP is more aimed at INSERT/DELETEs (on potentially big data).
OLAP-based traversals tend to process the entire graph, while OLTP based traversals tend to process smaller data sets by starting with one or a handful of vertices and traversing from there.
For example, let’s say you wanted to calculate the average age of friends of one particular user. Great use case for OLTP, since the query data set is small. However, if you wanted to calculate the average age of everyone on the database, OLAP is the preferred technology.
OLAP is optimal for deep analysis of a lot of data, while OLTP is better suited for fast running queries and a lot of INSERTs. If you’re trying to achieve a SLA where the analytics must complete within a certain timeframe, consider the type of analytics and which one is better suited. Or maybe you need both.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart