Run a graph algorithm in an open transaction - neo4j

I have been testing neo4j for graph projects for 1 or 2 month now and it has been really efficient, but I'm having a hard time finding how to solve one of my problem and I'm seeking for advice.
I'm using neo4j to store graph databases and check that they follow some structural requirements, for example, I have a db modeling dependency between items : the nodes are the items and the links are labeled "need" or "incompatible" to model the dependency and I want neo4j to check the coherence of the data.
I coded the checker in a server plugin and it works very well. But now I would like to allow users to connect to the database, modify the data (without saving the modification yet), check that the modifications are not breaking the coherence and then save the modifications.
I found the http endpoint which can keep a transaction open and it completely fits the "modifying the db without saving" need, but I can't find how to run my checker on the modified data : is there a way to run something else than Cypher query with the http endpoint or do I have to consider an other way to solve this ?
I now it would be possible to run my checker using the TransactionEventHandler beforeCommit, but it means the user couldn't know if his data are okay without starting a commit, and the fact that the data are split between the db without modification and the TransactionData which store the modification make the checker tricky to apply.
So, if someone knows how I could solve this, it would be great.
Thank you.

Your options is to use Unmanaged Extension and Transaction Event API.
You are able to handle incoming transaction and read all data which are in it. If transaction break your rules, then you can discard the transaction.
I recommend you to use GraphAware framework for that.
Here is the great article about that http://graphaware.com/neo4j/transactions/2014/07/11/neo4j-transaction-event-api.html

Related

Syncing of memory and database objects upon changes in objects in memory

I am currently implementing a web application in .net core(C#) using entity framework. While working on the project, I actually encountered quite a few challenges but I will start with the one which I think are most important. My questions are as follows:
Instead of frequent loading data from the database, I am having a set of static objects which is a mirror of the data in the database. However, it is tedious and error prone when I want to ensure any changes, i.e., adding/deleting/modifying of objects are being saved to the database at real time. Is there any good example or advice that I can refer to improve my approach to do this?
Another thing is that value of some objects' properties will be changed on the fly according to the value of some other objects' properties. Something like a spreadsheet where a cell's value will be changed automatically if the value in the cell that the formula is referring to changes. I do not have a solution to do this yet. Appreciate if anyone has any example that I can refer to. But this will add another layer of complexity to sync the changes of the objects in memory to database.
At the moment, I am unsure if there is any better approach. Appreciate if anyone can help. Thanks!
Basically, you're facing a problem that's called eventual consistency. Something changes and two or more systems need to be aware at the same time. The problem here is that both changes need to be applied in order to consider the operation successful. If either one fails, you need to know.
In your case, I would use the Azure Service Bus. You can create queues and put messages on a queue. An Azure Function would handle these queue messages. You would create two queues, one for database updates, and one for the in-memory update (I think changing this to a cache service may be something to think off). Now the advantage of these queues is that you can easily drop messages on these queues from anywhere. Because you mentioned the object is going to evolve, you may need to update these objects either in the database or in memory (cache).
Once you've done that, I'd create a topic, with two subscriptions. One forwarding messages to Queue 1, and the other to Queue 2. This will solve your primary problem. In case an object changes, just send it to the topic. Both changes (database and memory) will be executed automagically.
The only problem you have now, it that you mentioned you wanted to update the database in real-time. With this scenario, you're going to have to leave that.
Also, you need to make sure you have proper alerts in place for the queues so in case you did miss a message, or your functions didn't handle it well enough, you'll receive an alert to check & correct errors.
I'm totally agree with #nineedm's and answer, but there are also other solutions.
If you introduce cache, you will always face cache revalidation problem - you have to mark cache as invalid when data were changed. Sometimes it is easy, depending on nature of cached data and how often data are changed.
If you have just single application, MemoryCache can be enough with proper specified expiration options.
If there is a cluster - you have to look at Distributed Cache solutions, for example Redis. There is MS article about that Distributed caching in ASP.NET Core

How to connect Jboss BRMS (6.4.0.GA) to any database

I have a SQL Server database with a Person table and I want to load a list of these people from the database to an Arraylist or List in the BRMS to apply the rules. how can I do this?
The best practice is to delegate the data retrieval logic to the caller.
The pattern should be:
Retrive the data from a DB or whatever
Fill in the data in the Working Memory
Fire the rules
Collect the results
Depending on the application you can use the results to update a DB
The BRMS has the ability to retrieve data in the rule logic but it should be considered a bad practice, or something to do when no other options are available (really rare case, in rare situation). Otherwise, the BRMS performances will be terrible and the overall code really hard to maintain.

How to handle SAP Kapsel Offline app OData conflicts properly?

I build an app that is able to store OData offline by using SAP Kapsel Plugins.
More or less it's the same as generated by WEB ID or similer to the apps in this example: https://blogs.sap.com/2017/01/24/getting-started-with-kapsel-part-10-offline-odatasp13/
Now I am at the point to check the error resolution potential. I created a sync conflict (chaning data on the server after the offline database was stored and changed something on the app and started a flush).
As mentioned in the documentation I can see the error in ErrorArchive and could also see some details. But what I am missing is the information of the "current" data on the database.
In the error details I can just see the data on the device but not the data changed on the server.
For example:
Device is loading some names into offline store
Device is offline
User A is changing some names
User B is changing one of this names directly online
User A is online again and starts a sync
User A is now informend about the entity that was changed BUT:
not the content user B entered
I just see the "offline" data.
Is there a solution to see the "current" and the "offline" one in a kind of compare view?
Please also note that the server communication is done by the Kapsel Plugin and not with normal AJAX calls. This could be an alternative but I am wondering if there is no smarter way supported by the API?
Meanwhile I figured out how to load the online data (manually).
This could be done by switching http handler back to normal one.
sap.OData.removeHttpClient();
sap.OData.applyHttpClient();
Anyhow this does not look like a proper solution and I also have the issue with the conflict log itself. It must be deleted before any refresh could be applied.
I could not find any proper documentation for that. Also ETag handling is hardly described in SAPUI5 and SAP Kapsel documentation.
This question is a really tricky one, due to its implications. I understand that you are simulating a synchronization error due to concurrent modification, and want to know if there is a way for the client to obtain the "current" server state in order to give the user a means to compare the local and server state.
First, let me give you the short answer: No, there is no way for the client to see the current server state "for reference" via the Offline APIs when there are synchronization errors. Doing an online query as outlined above might work, but it certainly is a bad idea.
Now for the longer answer, which explains why this is not necessarily a defect and why I said there are quite some implications to the answer.
Types of Synchronization Errors
We distinguish a number of synchronization errors, and in this context, we are clearly dealing with business-related issues. There are two subtypes here: Those that the user can correct, e.g. validation errors, and those that are issues in the business process itself.
If the user violates the input range, e.g. by putting a negative price for a product, the server would reply with the corresponding message: "-1 is not a valid input value for 'Price'". You, as a developer, can display such messages to the user from the error archive, and the ensuing fix is indeed a very easy one.
Now when we talk about concurrent modification, things get really, really nasty. In fact, I like to say that in this case there is an issue with the business process, because on one hand, we allow data to get out of sync. On the other hand, the process allows multiple users to manipulate the same piece of information. How all relevant users should now be notified and synchronize, is no longer just a technical detail, but in fact a new business process. There just is no way to generically device how to handle this case. In most cases, it would involve back-office experts who need to decide how the changes should be merged.
A Better Solution
Angstrom pointed out that there is no way to manipulate ETags on the client side, and you should in fact not even think about it. ETags work like version numbers in optimistic locking scenarios, and changing the ETag basically means "Just overwrite what's on the server". This is a no-go in serious scenarios.
An acceptable workaround would be the following:
Make sure the server returns verbose error messages so that the user can see what happened and what caused the conflict.
If that does not help, refresh the data. This will get you an updated ETag, and merge the local changes into the "current" server state, but only locally. "Merging" really means that local changes always overwrite remote changes.
The user now has another opportunity to review the data and can submit it again.
A Good Solution
Better is not necessarily good, so here is what you should really do: Never let concurrent modification happen because it is really expensive to handle. This implies that not the developer should address this issue, but the business needs to change the process.
The right question to ask is, "When you replicate data in a distributed system, why do you allow it to be modified concurrently at all?" Typically stakeholders will not like this kind of question, and the appropriate reaction is to work out a conflict resolution process together with them. Only then they will realize how expensive fixing that kind of desynchronization is, and more often than not they will see that adjusting the process is way cheaper than insisting in yet another back-office process to fix the issues it causes. Even if they insist that there is a need for this concurrent modification, they will now understand that it is not your task to sort this out and that they need to invest in a conflict resolution process.
TL;DR
There is no way to compare the server and client state to the server state on the client, but you can do a refresh to retain the local changes and get an updated ETag. The real solution, however, is to rework the business process, because this no longer is a purely technical issue.
The default solution is that SMP or HCPms is detecting errors by ETags. At client side there is no API to manipulate ETags in case of conflicts. A potential solution to implement a kind of diff view on the device would work like this:
Show errors
Cache errors (maybe only in memory?)
delete the errors
do a refresh of the database
build a diff view with current data and cached errors
The idea with
sap.OData.removeHttpClient();
sap.OData.applyHttpClient();
could also work but could be very tricky and may introduce side effects.
Maybe some requests are triggered against the "wrong" backend.

Keeping a 'revisionable' copy of Neo4j data in the file system; how?

The idea is to have git or a git-like system (users, revision tracking, branches, forks, etc) store the 'master copy' of objects and relationships.
Since the master copy is on the filesystem, any changes can be checked in, tracked, and backed up. Neo4j could then import the files and serve queries. This also gives freedom since node and connection files can be imported to any other database.
Changes in Neo4j can be written to these files as part of the query
Nodes and connections can be added by other means (like copying from a seed dataset)
Nodes and connections are rarely created/updated/deleted by users
Most of the usage is where Neo4j shines: querying
Due to these two, the performance penalty on importing can be safely ignored
What's the best way to set this up?
If this isn't wise; how come?
It's possible to do that, but it will be lot of work which would not have a real value. IMHO.
With unmanaged extension for Transaction Event API you are able to store information about each transaction onto disk in your common file format.
Here is the some information about Transaction Event API - http://graphaware.com/neo4j/transactions/2014/07/11/neo4j-transaction-event-api.html
Could you please tell us more about the use case and how would design that system?
In general nothing keeps you from just keeping neo4j database files around (zipped).
Otherwise I would probably use a format which can be quickly exported / imported and diffed too.
So very probably csv files with node-file per label ordered by a sensible key
And then relationship-files between pairs of nodes, with neo4j-import you can recover that data quickly into a graph again.
If you want to write changes to the files you have to make sure they are replayable (appends + updates + deletes) , i.e. you have to chose a format which is more or less a transaction-log (which Neo4j already has).
If you want to do it yourself the TransactionHandler is what you want to look at. Alternatively you could dump the full database to a snapshot at times you request.
There are plans to add point-in-time recovery on the existing tx-logs, which I think would also address your question.

DynamoDB auto incremented ID & server time (iOS SDK)

Is there an option in DynammoDB to store auto incremented ID as primary key in tables? I also need to store the server time in tables as the "created at" fields (eg., user create at). But I don't find any way to get server time from DynamoDB or any other AWS services.
Can you guys help me with,
Working with auto incremented IDs in DyanmoDB tables
Storing server time in tables for "created at" like fields.
Thanks.
Actually, there are very few features in DynamoDB and this is precisely its main strength. Simplicity.
There are no way automatically generate IDs nor UUIDs.
There are no way to auto-generate a date
For the "date" problem, it should be easy to generate it on the client side. May I suggest you to use the ISO 8601 date format ? It's both programmer and computer friendly.
Most of the time, there is a better way than using automatic IDs for Items. This is often a bad habit taken from the SQL or MongoDB world. For instance, an e-mail or a login will make a perfect ID for a user. But I know there are specific cases where IDs might be useful.
In these cases, you need to build your own system. In this SO answer and this article from DynamoDB-Mapper documentation, I explain how to do it. I hope it helps
Rather than working with auto-incremented IDs, consider working with GUIDs. You get higher theoretical throughput and better failure handling, and the only thing you lose is the natural time-order, which is better handled by dates.
Higher throughput because you don't need to ask Dynamo to generate the next available IDs (which would require some resource somewhere obtaining a lock, getting some numbers, and making sure nothing else gets those numbers). Better failure handling comes when you lose your connection to Dynamo (Dynamo goes down, or you are bursty and your application is doing more work than currently provisioned throughput). A write-only application can continue "working" and generating data complete with IDs, queueing it up to be written to dynamo, and never worry about ID collisions.
I've created a small web service just for this purpose. See this blog post, that explains how I'm using stateful.co with DynamoDB in order to simulate auto-increment functionality: http://www.yegor256.com/2014/05/18/cloud-autoincrement-counters.html
Basically, you register an atomic counter at stateful.co and increment it every time you need a new value, through RESTful API.

Resources