as I am relatively new to Neo4J and I was wondering if it is possible to impose user defined data integrity constraints on the stored data.
The manual says that it is possible to impose UNIQUE constraints and here Michael Hunger pointed out that in the current RC NOT NULL constraints have been added.
I was wondering if it is possible, in some way, to define constraints like "every node with label X has to have a relationship with Label Y" or to impose, in some way, a type system, possibly with a type hierarchy and everything.
Such constraints should automatically be checked by the DBMS, like in many of the old school (relational) database systems.
Cheers!
No, it's not possible to have same functionality like traditional RDBMS has, at least not out of the box.
You can write your own Unmanaged Extensions which could handle that for you. You can find basic information how to do that in this article.
I'm not aware of any existing "plugin". In the future GraphAware Enterprise should bring "schema enforcement".
Related
With a normal 'graph database' the data is broken up into nodes and edges, and there isn't much of a restriction/schema between the connections. With this, it seems great for modeling straightforward graphs where the relationships are relatively consistent -- Movies with cast and crew; Computer networks with IPs and devices; Social networks with users and connections; etc.
Are there any graph-like databases that can be more specialized? For example to be able to model something like an electrical circuit where each component has a sort of 'schema' or well defined input and output -- i.e., a Resistor has two connections and has various properties:
a Transistor takes has three connections and has various properties, etc.
I'm not asking about particular circuit simulators, such as https://www.falstad.com/circuit/circuitjs.html, but more about whether it's possible in any graph (or pseudo-graph) databases to model and enforce very specific, well-defined relationships in a network, such as circuit design.
Definitely possible.
I've been working on this problem with Neo4j, and Restagraph is the result. It provides a REST API that enforces a schema on any updates to the database, and I've packaged it as a Docker image.
I haven't really promoted it so far, because it's only recently been mature enough for my own use, and I really need to improve the documentation. If you try it out, though, I'd love to hear any feedback you have.
TLDR: in general yes, but it depends.
This is a really broad question, so let me break it down.
While it's a little exaggerating to talk about all graph databases (which are not as standardized as SQL databases - which in turn are not very standardized as well), so take this answer with a grain of salt: Yes, that is possible.
As in SQL databases, you usually can set up constraints to be checked before any changes in data is persisted.
Most graph databases incorporate something along the lines of a "type", similarly to what a table represents in SQL databases. Some allow to constrain relationships to only target specific types, so you could restrict relationships e.g. between a node using a CAN bus and an I2C-bus to the specific types.
If a database does not provide these mechanisms, it's usually possible to constrain relationships to the existence of specific keys and values in the model. To have another example than your circuit one: Imagine a node-based system, which has typed inputs and outputs - an int-based output can only be connected to an int based input, a float based output only to a float based input, etc. Then you could add a field output_type and input_type to the nodes and constrain relationships between the values.
As soon as you add the ability to write (the SQL-similar stored) procedures, you can write very complex data integrity constraints.
So, while it is possible, the question is, if you should.
How much logic you actually want to put into your database is a decades-long heated argument. At some point in your application architecture, you will have to check the validity of the data that you are handling. Handling the data consistency in the database itself solves a lot of problems with race conditions or performance issues through multiple round trips between the application and the database, which would occur if the consistency checks are done in the application layer.
Putting a lot of your logic into the database severely limits your ability to switch databases ("vendor lock-in"), might lead to code duplication between your application layer and your database, and sprays your logic between two (or more) layers of your architecture (which makes it harder to find bugs, introduces temporal coupling, and might re-introduce race conditions and performance problems where you have to use transactions again).
My personal take is along the lines of Steve Wozniak - see your database as another service. If that service can provide you with everything you need to ensure data integrity, it might be a good idea to just use the database directly. But if this increases the problems I mentioned before, you might be better off putting a layer between your database and your business logic.
Objectivity/DB is object/graph database that uses schema. You can absolutely do what you are proposing. It supports complex object definitions including type inheritance and it has a full graph/navigational query language similar to Cypher. www.objectivity.com
I am thinking of using a graph DB to store IFC data. Ideally, the DB should provide a way to define all the rule types defined in the IFC schema. However, I don't think there are any such databases because some of the rule types in IFC are very complex and requires querying the DB. Others are simple, like uniqueness of GUID, existence of mandatory attributes, or data validation. Neo4j seem to have a few constraint enforcing methods:
Neo4j helps enforce data integrity with the use of constraints. Constraints can be applied to either nodes or relationships. Unique node property constraints can be created, as well as node and relationship property existence constraints.
Are there other methods that can ensure compliance of entered data with a predefined schema?
Or are there other graph DBs that are more suitable for this job?
You can achieve pretty much everything you want by creating Transaction Event handlers.
http://neo4j.com/docs/java-reference/current/javadocs/org/neo4j/graphdb/event/TransactionEventHandler.html
You can also take a look at the GraphAware Framework and all its submodules for use-cases and also the ease of creating and deploying neo4j extensions.
Depends on whether you need the schema enforced by the database itself, or whether you're OK with that being done at the application layer.
I've just gotten Restagraph to the "working prototype" level, and my next trick is Dockerising it.
It's a framework of sorts, that enables you to define a schema by creating nodes and relationships in Neo4J with specific labels, and which dynamically creates a REST API to enforce it.
It's also written in Common Lisp, so I'll understand if you wait for the Docker image :)
I want to achieve, with a Neo4j graph a RDBMS's ability to define and enforce a known schema. We know what our graph should look like (all the edge types and node types). So we simply want to prevent someone (developer/user) from adding an edge or node type which is "invalid" i.e. not part of the defined graph schema. How can we enforce a graphs schema? Note I am not asking about how to enforce the properties of an edge or a graph but simply how to enforce that the graph is made up if a specific set of known edge and node types.
Please help
This should probably be done on the application side. Build a wrapper/API that enforces this sort of thing, and make the developers use it. Sorry for the short answer...
Most of the language drivers or frameworks listed here provide means to define a schema:
http://www.neo4j.org/drivers
For Java we developed structr (https://github.com/structr/structr) where you define your schema in Java beans. You could start f.e. with the simple Maven archetype as shown in this screencast: http://vimeo.com/53235075
Cheers
Axel
It has to happen in a layer above Neo4j. I've been building one of those layers (Restagraph), which puts a REST interface on top of it.
It's a mite less mature than Structr, but may be worth a look. I package it in a Docker image, and it's designed so you can easily define your own schema in YAML files.
I'm hoping to hear from any of you who have architected and implemented a decent sized Neo4j app (10's millions nodes/rels) - and what your recommendations are particularly w.r.t modelling and the various APIs (vanilla java/groovy Neo4j vs Spring-Data-Neo4j vs Grails GORM/Neo4j).
I'm interested if it actually pays off to add the extra OGM (object-graph-mapping) layer and associated abstractions?
Has anyone's experience been that it is best to stick to 'plain' graph-modelling with nodes+properties, relationships+properties, traversals and (e.g.) Cypher to model and store their data?
My concern is that 'forcing' a particular OGM abstraction onto a graph database will affect future flexibility in adapting/changing the domain model and/or flexibility in querying the data.
We're a Grails shop, and I have experimented with GORM/Neo4J and also with spring-data-neo4j.
The primary purpose for the dataset will be to model and query relationships amongst v.large numbers of people, their aliases, their associates and all sorts of criminal activity and history. There will be more than 50 main domain classes. There must be flexibility in the model (which will need to evolve rapidly in the early phases of the project) and in speed and flexibility of querying.
I have to confess, I'm struggling to find a compelling reason to use a OGM layer when I can use (e.g.) POJOs or POGOs, a little Groovy magic and some simple hand-rolled domain object <-> node/relationship mapping code. As far as I can tell, I think I would be happy just dealing with nodes & traversals & Cypher (aka KISS). But I would be very happy to hear others' experiences and recommendations.
Thanks for your time & thoughts,
TP
since I'm the author of the Grails Neo4j plugin, I might be biased. The main reason for creating the plugin was to apply the ease of Grails domain classes with their powerful out-of-the-box scaffolding to Neo4j for ~80% of the use cases. For the other 20% where specific requirements require stuff like traversals etc. we're using Neo4j APIs directly (traversals/cypher) and do not use the GORM API.
The current version of the Neo4j plugin suffers from a supernode issue since each domain instance is connected to a subreference node. If multiple concurrent requests (aka threads) add new domain instances there is chance to get a locking exception. I'm about to fix that either by a sub-subreference approach or by using indexing.
Cypher can also be used in the Neo4j Grails plugin.
Spring-Data-Neo4j on the other hand is a more advanced approach with finer control over mapping details, but requires usage of specific annotations. And I found no easy way to integrate that into Grails in a way scaffolding works.
We're using the predecessor version of the plugin in a productive application with ~60k users and ~10^6 rels. Due to NDA I cannot provide more details on that.
We do not use grails, but do use a hybrid plain neo4j / spring-data-neo4j solution. The reason is based on the fact that some of our domain data has a fixed schema and some doesn't. SDN takes a lot of the burden away and can be mixed with plain neo4j if the need arises.
We have classes that describe a data model, the objects for these classes we persist using SDN, with no additional tricks, we just use the basics from SDN. Then we have classes that contain the data for the model that is not known beforehand. These are stored in nodes contain special properties for describing what model type the data refers to. When neo4j 2 gets released, we will probably move that info into labels. Between these nodes there can be relations, also described by the aforementioned data model managed by sdn. We also have relations from the generic nodes to SDN nodes, which works fine, as everything ends up being the same things: nodes.
We have not encountered any issues yet using this approach. The thing we love the most is that the data of which we do not know in advanced how it will be modelled, is stored in the way you would have wanted to store data when you would have known it in advance, making the data actually match the model chosen, which is very hard to do when using any other type of (non-graph) database.
I have an application which will require a "dynamic business rules" engine. Some of the business rules changes very frequently. Some of then applies for a limited set of business accounts. For example: my customer have a process where they qualify stores, based on their size, number of the sales person, number of products, location, etc. But he manages different account, and each account give different "weights" to each attribute.
How do I implement this engine using Ruby? I know Java has drools, but I find drools annoying and complex. And I prefer not having to use JRuby...
Regards,
Rubem
If you're sure a rule engine is what you need, you will need to find one you can use in Ruby. A quick Google search brought up Rools (http://rools.rubyforge.org/) and Ruby Rules (http://xircles.codehaus.org/projects/ruby-rules). I'm not sure of the status of either project though. Using JRuby with Drools might be your best bet but then again, I'm a Java developer and a big Drools advocate. :)
Without knowing all the details, it's a little hard to say how that should be implemented. It also depends on how you want the rules to be updated. One approach is to write a collection of rules similar to this: "if a store exists with more than 50 sales people and the store hasn't had its weight updated to reflect that, then update the store's weight." However, in some way you could compare that to hardcoding.
A better approach might be to create Weight objects with criteria that need to be met for the weight to apply. Then you could write one rule that matches on both Weights and Stores: "if a Store exists that matches a Weight's criteria and the Store doesn't already have that Weight assigned to it, then add the Weigh to the Store." Then the business folks could just create and update Weights, possibly in a web front-ended database, instead of maintaining rules.