Are there Label properties - Is there a Neo4j schema for Labels? - neo4j

How to ensure that all nodes of a label have some common properties ?
For example, I want to create a property "name" for all nodes of a label "Person", but I can make a mistake in writing of property name (namee ! for example)

There is no such mechanism built in Neo4j today (the current version of Neo4j at the time of writing is 2.1.6). What you are describing is some sort of schema (if you compare e.g. DDL for a RDBMS) and Neo4j is basically schema free. This type of structural integrity is quite often handled in the application layer for NoSQL databases.
The only schema operations that are available today for Neo4j are described here.
Currently they include:
Unique - e.g. CREATE CONSTRAINT ON (p:Person) ASSERT p.name IS UNIQUE
Indexes - create an index on a label e.g. CREATE INDEX ON :Person(name)
A comment on this answer from Michael Hunger who is part of team behind Neo4j indicates that more constraints will be available for Neo4j in the future. Furthermore, Michael points to the following alternatives:
Take a look at Structr, a layer above Neo4j that among other things enforces a stricter schema (check the schema docs here)
SylvaDB, an easy-to-use layer above Neo4j that also has schema support. Seems very
In addition to this, FrobberOfBits pointed to the tool NeoProfiler that contains a number of profilers, most of which run very simple Cypher queries against your database and provide summary statistics. Some profilers will actually discover data in your graph and then spawn other profilers which will run later. For example, if a label called "Person" is discovered in the data, a label profiler will be added to the run queue to inspect the population of nodes with that label.

Related

Indices in Neo4j - questions and doubts

The only indices that I know about them are indices on properties (these indices are created on particular labels (node types)). I have some doubts, however.
Are there exists indices on edges/relationships?
I often read that Neo4j leveraged Lucene Index. Is it still used? What is aim?
Are there exists any other indicses than indices on properties?
Thanks in advance,
Neo4j has two indexing systems.
The more modern one is referred to as "schema indexes", and these are the ones that are automatic and apply to properties of a given label for quick lookup by those properties when the given properties and label are provided within a query. This does not currently support indexing of relationship properties. These started out based on lucene, but we've gradually replaced the implementation with our own native indexing solution. Discussion of these, as well as any noteworthy information and limitations, can be found in our index configuration documentation.
The other indexing system is an older manual system that is called "explicit indexes", though this has previously been called "manual indexes". This is also based on lucene, but these are not automatic -- it is up to the user to manually add or remove entries to the index and keep them up to date when data in the database changes. This makes usage and maintenance cumbersome, and we recommend avoid using this system if possible.
Built-in procedures are the means to create and lookup using explicit indexes, as these are never used automatically under the hood (as opposed to schema indexes). APOC Procedures also offers various means of interfacing with explicit indexes.
The main reason one would use explicit indexes is because you are able to create an index on relationships for properties and get fast lookup when querying the index. This also allows for a full text lookup across multiple labels and properties, provided the index has been configured in such a way.
Separate from all of these, it should be noted that usage of labels is itself a kind of index, as it provides quick access to all nodes with the given label.

Neo4j web client fails with large Cypher CREATE query. 144000 lines

I'm new to neo4j and currently attempting to migrate existing data into a neo4j database. I have written a small program to convert current data (in bespoke format) into a large CREATE cypher query for initial population of the database. My first iteration has been to somewhat retain the structuring of the existing object model, i.e Objects become nodes, node type is same as object name in current object model, and the members become properties (member name is property name). This is done for all fundamental types (and strings) and any member objects are thus decomposed in the same way as in the original object model.
This has been fine in terms of performance and 13000+ line CREATE cypher queries have been generated which can be executed throuh the web frontend/client. However the model is not ideal for a graph database, I beleive, since there can be many properties, and instead I would like to deomcompose these 'fundamental' nodes (with members which are fundamental types) into their own node, relating to a more 'abstract' node which represents the more higher level object/class. This means each member is a node with a single (at first, it may grow) property say { value:"42" }, or I could set the node type to the data type (i.e integer). If my understanding is correct this would also allow me to create relationships between the 'members' (since they are nodes and not propeties) allowing a greater freedom when expressing relationships between original members of different objects rather than just relating the parent objects to each other.
The problem is this now generates 144000+ line Cypher queries (and this isn't a large dataset in compraison to others) which the neo4j client seems to bulk at. The code highlighting appears to work in the query input box of the client (i.e it highlights correctly, which I assume implies it parsed it correctly and is valid cypher query), but when I come to run the query, I get the usual browser not responding and then a stack overflow (no punn intended) error. Whats more the neo4j client doesn't exit elegantly and always requires me to force end task and the db is in the 2.5-3GB usage from, what is effectively and small amount of data (144000 lines, approx 2/3 are relationships so at most ~48000 nodes). Yet I read I should be able to deal with millions of nodes and relationships in the milliseconds?
Have tried it with firefox and chrome. I am using the neo4j community edition on windows10. The sdk would initially be used with C# and C++. This research is in its initial stages so I haven't used the sdk yet.
Is this a valid approach, i.e to initially populate to database via a CREATE query?
Also is my approach about decomposing the data into fundamental types a good one? or are there issues which are likely to arise from this approach.
That is a very large Cypher query!!!
You would do much better to populate your database using LOAD CSV FROM... and supplying a CSV file containing the data you want to load.
For a detailed explaination, have a look at:
https://neo4j.com/developer/guide-import-csv/
(This page also discusses the batch loader for really large datasets.)
Since you are generating code for the Cypher query I wouldn't imagine you would have too much trouble generating a CSV file.
(As an indication of performance, I have been loading a 1 million record CSV today into Neo4j running on my laptop in under two minutes.)

Metadata in neo4j graph database

I know that neo4j stores data structured in graphs rather than in tables. In RDBMS we will be having schemas of the tables but in neo4j we will not be having the tables. Only nodes, relations and properties are defined. So is there any concept of metadata in neo4j. Like is there any information stored about nodes, relationships in the database? If yes, how and what it stores in the metadata? Also where can we find the metadata related information in the graph database (location)
Thanks,
Neo4J doesn't directly store metadata in the way that you're looking for. The NeoProfiler tool was written precisely for this purpose. You can run it on a Neo4J database, and it will pull out as much information on labels, indexes, constraints, properties, nodes, and relationships as it can. The way that this works isn't too far off of the queries that #ulkas suggests in the other answer here, the output is just much better.
More broadly, in an RDBMS the schema information you pull out substantially constrains the database. The schema there is like a set of rules; you can't insert data unless it conforms to that schema. In Neo4J, because it's so flexible, even if there was a schema it would just be documentation of what's there, it would not be a set of constraints on what you can put in. At any time, you can insert new data that has nothing to do with the present schema (except that you can't violate things like uniqueness constraints).
If you want to see an equivalent schema for your database in neo4j, check out neoprofiler linked above. A few people out there have written about "metagraphs" - that is, they talk about representing a neo4j schema as a graph itself, where for example a node refers to a label. Relationships from that "label node" then go out to other kinds of label nodes, specifying what sorts of relationships can exist between nodes. For example, nodes labeled "Employee" may frequently have "works_for" relationships to nodes of label "Company".
no, direct metadata are not present. the maximum you can do is to query all the structure types and have a small inside what kind of graph could be stored in the db.
START r=rel(*)
RETURN type(r), count(*)
START n=node(*)
RETURN labels(n), count(*)
the specific database files are stored in the folder data/graph.db but besides some index and key files they are binary and not easy to read.
Meanwhile there is the official APOC Library.
This includes functions like apoc.meta.graph, apoc.meta.schema and others.
The link above describes the installation, if you run into sandbox errors, check the answers in this question

How to partition a single Neo4j database?

Can one Neo4j database be divided up so that there are multiple starting points in one database so that all queries can be isolated, instead of having multiple databases?
I have thought about this and I think it can work up to a point, but once things like labels are used then the idea will not work, as a label query will always span the whole database.
Anyway I would like to know if anyone has successfully done this and how they did it.
What you are describing sounds like multitenancy. Neo4j 2.0.1 does not at this time support multitenancy as a feature. There are various methods and strategies for implementing a multitenant architecture within your Neo4j database instance.
You can partition sets of your property graph by label. Since nodes can have multiple labels, you can label one partition with a unique identifying label for that partition.
Please refer to documentation on labels here: http://docs.neo4j.org/chunked/milestone/graphdb-neo4j-labels.html
Things to note with this strategy are to ensure that all your Cypher calls contain the partition identifier for the label, to ensure that the two partitions are isolated from one another within the graph. It's important to ensure that relationships from one partition do not span into another partition.
For example, partition 1 could be the label Partition1. Assuming your application context is operating on Partition1:
MERGE (user:User:Partition1 { name: 'Peter' })
RETURN user
Assuming your application context is operating on Partition2:
MERGE (user:User:Partition2 { name: 'Peter' })
RETURN user
When executing these two queries, two separate Peters are created for Partition1 and Partition2.
You'll just need to ensure that the partition label your application is operating on appends its label to each one of your queries. While this is tedious, it is the suggested way to go about multitenancy at this time.

How can I port a relational database to Neo4j?

I am playing around with Neo4j but trying to get my head around the graph concepts. As a learning process I want to port a small Postgres relational database schema to Neo4j. Is there any way I can port it and issues "equivalent" relational queries to Neo4j?
Yes, you can port your existing schema to a graph database. Keep in mind that this is not necessarily the best model for your data, but it is a starting point.
How easy it is depends a lot on the quality of your existing schema.
The tables corresponding to entities in an entity-relationship-diagram define your types of nodes. In the upcoming neo4j 2.0, you can labels them with the name of the entity to make a lookup easier. In older versions you can use an index or a manual label property.
Assuming a best case, where all your relationships between data is modelled using foreign keys, any 1:1 relationship between nodes can be identified and ported next.
For tables modelling n:m relationships, identify the corresponding nodes and add a direct relationship between them.
So as an example assume tables Author[id, name, publisher foreign key], Publisher[id, name] and Book[id, title] and written_by[author foreign key, book foreign key].
Every row in Author, Publisher and Book becomes a node.
Every Author node gets a relationship to the publisher identified by the foreign key relationship.
For every row in written_by you add a relationship between the Author node and Book node referenced
For queries in neo4j I recommend cypher due to its expressiveness. A (2.0) query looking for books by some author would look like:
MATCH (author:Author)-[:written_by]-(book:Book)
WHERE author.name='Hugh Laurie'
RETURN book.title
You actually have several options at hand:
use the Talend connector for Neo4J
export your schema+data in CSV files consumable by the batch importer
or you can do it programmatically
I'm afraid not. The relational data model and the graph data model are two different ways of modelling a real-world domain. It requires a human brain (at least as of 2013) to understand the domain in order to model it.
I suggest that you take a piece of paper and capture, using circles and arrows, what your entities are (nodes) and how they relate to each other (relationships). Then, have a look at that piece of paper. Voila, your new Neo4j data model.
Then, take a query that you want to be answered and try to figure out how you would do that without a computer, just by tracing your nodes and relationships with a finger on that piece of paper. Once you've figured that out, translate what you've done to a Cypher query.
Have a look at neo4j.org, there are plenty of examples.
Check this out:
The musicbrainz -> neo4j
https://github.com/redapple/sql2graph/tree/master/examples/musicbrainz
Neo4j Sql-importer
https://github.com/peterneubauer/sql-import
Good Luck!
This tool does exactly that.
Import any relational db into neo4j
https://github.com/jexp/neo4j-rdbms-import

Resources