Neo4J - Query graph in temporary state - neo4j

I have a graph in which I would like to examine the effects of removing/adding nodes but not persist these changes and am wondering if this is possible?
For example, adding a node to the graph, running some aggregate functions to see how this would impact but not change the underlying data.
Is it possible to change the graph in a transaction, query the changed graph then rollback to the original state? Or would I potentially have to either copy the graph or keep a log of changes and revert manually?

Neo4j has transaction handling which allows you to make modifications and then commit the results, or roll them back. But all of your operations need to be done in the transaction context.
This is easiest to do in either a java stored procedure or from any client such as C#.
How do you query the graph?

Related

reliably querying “delta since last read” in neo4j

In neo4j I have an application where an API endpoint does CRUD operations on the graph, then I materialize reachable parts of the graph starting at known nodes, and finally I send out the materialized subgraphs to a bunch of other machines that don’t know how to query neo4j directly. However, the materialized views are moderately large, and within a given minute only small parts of each one will change, so I’d like to be able to query “what has changed since last time I checked” so that I only have to send the deltas. What’s the best way to do that? I’m not sure if it helps, but my data doesn’t contain arbitrary-length paths — if needed I can explicitly write each node and edge type into my query.
One possibility I imagined was adding a “last updated” timestamp as a property on every node and edge, and instead of deleting things directly, just add a “deleted” boolean property and update the timestamp, and then use some background process to actually delete a few minutes later (after the deltas have been sent out). Then in my query, select all reachable nodes and edges and filter them based on the timestamp property. However:
If there’s clock drift between two different neo4j write servers and the Raft leader changes from one to the other, can the timestamps go back in time? Or even worse, will two concurrent writes always give me a transaction time that is in commit order, or can they be reordered within a single box? I would rather use a graph-wide monotonically-increasing integer like
the write commit ID, but I can’t find a function that gives me that.
Or theoretically I could use the cookie used for causal consistency,
but since you only get that after the transaction is complete, it’d
be messy to have to do every write as two separate transactions.
Also, it just sucks to use deletion markers because then you have to explicitly remove deleted edges / nodes in every other query you do.
Are there other better patterns here?

Run a graph algorithm in an open transaction

I have been testing neo4j for graph projects for 1 or 2 month now and it has been really efficient, but I'm having a hard time finding how to solve one of my problem and I'm seeking for advice.
I'm using neo4j to store graph databases and check that they follow some structural requirements, for example, I have a db modeling dependency between items : the nodes are the items and the links are labeled "need" or "incompatible" to model the dependency and I want neo4j to check the coherence of the data.
I coded the checker in a server plugin and it works very well. But now I would like to allow users to connect to the database, modify the data (without saving the modification yet), check that the modifications are not breaking the coherence and then save the modifications.
I found the http endpoint which can keep a transaction open and it completely fits the "modifying the db without saving" need, but I can't find how to run my checker on the modified data : is there a way to run something else than Cypher query with the http endpoint or do I have to consider an other way to solve this ?
I now it would be possible to run my checker using the TransactionEventHandler beforeCommit, but it means the user couldn't know if his data are okay without starting a commit, and the fact that the data are split between the db without modification and the TransactionData which store the modification make the checker tricky to apply.
So, if someone knows how I could solve this, it would be great.
Thank you.
Your options is to use Unmanaged Extension and Transaction Event API.
You are able to handle incoming transaction and read all data which are in it. If transaction break your rules, then you can discard the transaction.
I recommend you to use GraphAware framework for that.
Here is the great article about that http://graphaware.com/neo4j/transactions/2014/07/11/neo4j-transaction-event-api.html

Keeping a 'revisionable' copy of Neo4j data in the file system; how?

The idea is to have git or a git-like system (users, revision tracking, branches, forks, etc) store the 'master copy' of objects and relationships.
Since the master copy is on the filesystem, any changes can be checked in, tracked, and backed up. Neo4j could then import the files and serve queries. This also gives freedom since node and connection files can be imported to any other database.
Changes in Neo4j can be written to these files as part of the query
Nodes and connections can be added by other means (like copying from a seed dataset)
Nodes and connections are rarely created/updated/deleted by users
Most of the usage is where Neo4j shines: querying
Due to these two, the performance penalty on importing can be safely ignored
What's the best way to set this up?
If this isn't wise; how come?
It's possible to do that, but it will be lot of work which would not have a real value. IMHO.
With unmanaged extension for Transaction Event API you are able to store information about each transaction onto disk in your common file format.
Here is the some information about Transaction Event API - http://graphaware.com/neo4j/transactions/2014/07/11/neo4j-transaction-event-api.html
Could you please tell us more about the use case and how would design that system?
In general nothing keeps you from just keeping neo4j database files around (zipped).
Otherwise I would probably use a format which can be quickly exported / imported and diffed too.
So very probably csv files with node-file per label ordered by a sensible key
And then relationship-files between pairs of nodes, with neo4j-import you can recover that data quickly into a graph again.
If you want to write changes to the files you have to make sure they are replayable (appends + updates + deletes) , i.e. you have to chose a format which is more or less a transaction-log (which Neo4j already has).
If you want to do it yourself the TransactionHandler is what you want to look at. Alternatively you could dump the full database to a snapshot at times you request.
There are plans to add point-in-time recovery on the existing tx-logs, which I think would also address your question.

Monitoring real-time changes to results of a neo4j query?

I have an incoming event stream of player interactions from an MMO. I want to construct a graph of the player's moment-to-moment interactions, continuously run queries on the activities of the past ~30-240 seconds, and update a graphical view, all in real-time.
Some more details about my particular case:
I have ~100-500 incoming events every second. These look like:
(PlayerA)->[:TAKES_ACTION]->(Event)->[:RECIPIENT]->(PlayerB)
where every event is time-stamped. Timestamps are accurate to the second. I plan on attaching every event to a node representing a timestamp, so I can restrict queries to the events attached to a set of X most recent timestamps.
I expect at any given time-frame for there to be ~1000-2000 active players.
My queries will be to group players together based on mutual interactions, to figure out which groups are currently engaged in combat with which other groups.
My main questions are:
Does Neo4j have any sort of "incremental update" functionality to efficiently update query results without re-running the entire query for every set of changes?
Does Neo4j have any sort of ability to "push" any changes to the results of a query to a client? Or would a client have to continuously poll the database?
Are there any optimisations or tricks to making a continuously repeated query as efficient as possible?
Answers
1) No. You can only execute query and get results.
2) No. Currently you can only make client -> server requests.
3) Yes.
Details
Let's get to the bottom of this one. Neo4j by default can offer you:
REST API
Transactional Cypher ednpoint
Traversal endpoint
Custom plugins
Custom unmanaged extensions
In your case you should implement unmanaged extension. This is best option to get desired functionality - develop it by yourself.
More information on extensions:
How to create unmanaged Neo4j extension?
Unmanaged extension template
Graphaware framework for extension development
In extension you can do everything you want:
Use Core API to make efficient queries for latest data
Create WebSocket endpoint for full-duplex communication channel between client and server
Implement any additional logic to format/represent your data correctly
Queries and performance
Cypher queries are compiled and cached on first execution. After that - cached query version is used. And query execution by itself is quite fast.
Recommendations:
Always use query parameters where it is possible. This allow Neo4j to efficiently reuse queries
Be smart on writing queries. Try to lower cardinality where possible.
Think about data model. Probably you can model your data in such way, when query always fetches only latest data. In you case probably relationship :LAST_EVENT, :PREVIOUS_EVENT and etc. can help.

Is it possible to find out the changes done to neo4j db over a time interval?

Is it possible to find out the updates / modifications / changes done to neo4j db over a time interval?
NEO4J DB will be polled at periodic intervals for finding the changes happened to it over that time period.
And then these changes have to be sync'd with other DBs.This is the real task.
Here changes include addition ,updation ,deletion of Nodes, Relationships,properties.
How do we track the changes that have been done in a particular timeframe. Not all nodes and relationships have timestamps set on it.
Add a timestamp field to each of your nodes and relationships that inserts the timestamp() while they are created. Then write a cypher query to bring back all nodes and relationships within the given time range.
EDIT
There are two ways of implementing this synchronization.
Option 1
If you can use Spring Data Neo4j then you can use the lifecycle events as explained here to intercept the CUD operations and do the necessary synchronization either synchronously or asynchronously.
Option 2
If you can't use Spring, then you need to implement the interception code yourself. The best way I can think of is to publish all the CUD operations to a Topic and then write subscribers that can each synchronize to to each of the stores. In your case you have Neo4jSubscriber, DbOneSubscriber, Db2Subscriber etc.
There is something called time tree where you use year month and day nodes to track changes and you can use this as well to get the history.
You also need to make sure you set the changing attributes / properties on the the relating object nodes when relating it to the day / month or year nodes
i hope this helps someone

Resources