Migration from Neo4j1.9 to Neo4j2.1.6 - neo4j

I am a beginner in graph databases and neo4j. I am trying to undestand how to make the migration (and what would this mean) from neo4j 1.9 to neo4j 2.1.6.
I read here the procedure I have to follow (http://neo4j.com/docs/stable/deployment-upgrading.html#explicit-upgrade).
I understand that after the upgrade I will have all the nodes and relationships I had previously together with the functionalities of neo4j2.1.6. Is that correct?
What I want to know is if there is a way to automatically declare labels, unique constraints and the new indexing functionalities during the migration.
Or this is something that I will have to do "manually" after?
Thank you in advance.
Dimitris

After the upgrade, you'll have the features of neo4j 2.1.* in the sense that you can use them, but it's not done automatically for you.
Labels, unique constraints, and certain types of indexes are the really useful new stuff that you'll see. Labels are a way of categorizing types of nodes. Say you have Person nodes and Job nodes, well you might want to apply those labels. But no database is smart enough by itself to automatically figure that out. Instead, what you might do is go through your data and apply the label.
After migration for example, you could do this:
MATCH (n)
WHERE has(n.first_name)
SET n:Person
RETURN n;
This will apply the "Person" label to any node with a first_name attribute.
Everything else (indexes, unique constraints) again has to be done manually by you. Consider it a portion of your graph structure design. Neo4J will let you implement any kind of graph you like, but it won't do it for you. :)

Related

neo4j - why and how create and use indexes?

I'm learning Neo4J and it is working well without create any index.
I can create and read nodes fine.
So, why/when should I create indexes? Maybe for performance? Is it a must?
You should create a lookup index when you are going to find nodes as starting points by this properties, e.g. a :Person(userId) or :Book(isbn) or :City(zip) or :Product(productNo).
Usually the stuff where you have a business (unique) identifier to find nodes.
In general for indexes there is some confusion because there are also legacy indexes (which are still used for fulltext and spatial) vs. the new exact schema indexes, see this post for more detail:
http://nigelsmall.com/neo4j/index-confusion

Neo4j data modeling for branching/merging graphs

We are working on a system where users can define their own nodes and connections, and can query them with arbitrary queries. A user can create a "branch" much like in SCM systems and later can merge back changes into the main graph.
Is it possible to create an efficient data model for that in Neo4j? What would be the best approach? Of course we don't want to duplicate all the graph data for every branch as we have several million nodes in the DB.
I have read Ian Robinson's excellent article on Time-Based Versioned Graphs and Tom Zeppenfeldt's alternative approach with Network versioning using relationnodes but unfortunately they are solving a different problem.
I Would love to know what you guys think, any thoughts appreciated.
I'm not sure what your experience level is. Any insight into that would be helpful.
It would be my guess that this system would rely heavily on tags on the nodes. maybe come up with 5-20 node types that are very broad, including the names and a few key properties. Then you could allow the users to select from those base categories and create their own spin-offs by adding tags.
Say you had your basic categories of (:Thing{Name:"",Place:""}) and (:Object{Category:"",Count:4})
Your users would have a drop-down or something with "Thing" and "Object". They'd select "Thing" for instance, and type a new label (Say "Cool"), values for "Name" and "Place", and add any custom properties (IsAwesome:True).
So now you've got a new node (:Thing:Cool{Name:"Rock",Place:"Here",IsAwesome:True}) Which allows you to query by broad categories or a users created categories. Hopefully this would keep each broad category to a proportional fraction of your overall node count.
Not sure if this is exactly what you're asking for. Good luck!
Hmm. While this isn't insane, think about the type of system you're replacing first. SQL. In SQL databases you wouldn't use branches because it's data storage. If you're trying to get data from multiple sources into one DB, I'd suggest exporting them all to CSV files and using a MERGE statement in cypher to bring them all into your DB at once.
This could manifest similar to branching by having each person run a script on their own copy of the DB when you merge that takes all the nodes and edges in their copy and puts them all into a CSV. IE
MATCH (n)-[:e]-(n2)
RETURN n,e,n2
Then comparing these CSV's as you pull them into your final DB to see what's already there from the other copies.
IMPORT CSV WITH HEADERS FROM "file:\\YourFile.CSV" AS file
MERGE (N:Node{Property1:file.Property1, Property2:file.Property2})
MERGE (N2:Node{Property1:file.Property1, Property2:file.Property2})
MERGE (N)-[E:Edge]-(N2)
This will work, as long as you're using node types that you already know about and each person isn't creating new data structures that you don't know about until the merge.

Is there any way to ensure that a node is only connected to one instance of a particular relationship type

To clarify, let's assume that we have nodes representing people and the following relationships: "BIOLOGICAL_MOTHER" and "BIOLOGICAL_FATHER".
Then, for any person node, said node can only have one "BIOLOGICAL_MOTHER" and one "BIOLOGICAL_FATHER". How can we ensure that this is the case?
No. Neo4J currently only supports uniqueness constraints.
I believe several people are working on different schema constructs for neo4j, that would permit you to constrain graphs in any number of different ways. What it seems you're asking for boils down to a database constraint that if there is a relationship of type BIOLOGICAL_FATHER from one person to another, that the DB may not accept any creation of new relationships of that same type. In other words, relationship cardinality constraints, by relationship type.
At the moment, I think the best you can do is verify in your application code that such a relationship doesn't exist before creating it, but the DB won't do this checking for you.
The particular constraint you're looking for sounds easy enough, hopefully a neo4j dev will jump in here and say, "Oh, no worries, that's planned for release XYZ" - but I'm not sure about that.
More broadly, there are a number of issues with graphs that make constraints very tricky. In my personal graph domain, I'd like to make it impossible to create new relationships such that they would introduce cycles in the graph over a particular relationship type. (E.g. (a)-[:owns]->(b)-[:owns]->(a) is extremely undesirable for me). This would be a very costly constraint to actually enforce in the general case, since verifying whether a new relationship was OK could potentially involve traversing a huge graph.
Over the long run, it seems reasonable that neo4j might implement local constraints, but still shy away from anything that implied non-local constraint checking.
Steve,
In terms of Cypher, if I am given two names of people - say Sam and Dave, and wish to make Sam the father of Dave, but only if Dave doesn't yet have a father, I could do something like this:
MATCH (f {name : 'Sam'}), (s {name : 'Dave'})
WHERE NOT (s)<-[:FATHER]-()
CREATE (f)-[:FATHER]->(s)
If Dave already has a father the WHERE clause filters Dave out, which means no relationship will be created.
Grace and peace,
Jim

Neo4j auto increase schema index

It is recommended not to use Neo4j's id property because it may change, but rather create our own identifier. Then to identify my users, I plan to create a user_id property on the nodes labelled User and put an index on it. However, I cannot figure out a way to make it auto increase.
After some searching, I noticed there are two kinds of indexes in Neo4j, the schema index and the legacy index. Could anyone explain to me the difference between them? And is there a way to make my user_id index auto increase?
Schema indices are effectively labels, e.g. :User. You can also create indices on the properties of those labels if you wish. There's also no need to specify which index you're using as this is done automatically, in this case.
Legacy indices are the node indices that were around prior to Neo4j 2.0. They're a traditional index where you can specify what you're indexing and which properties they apply to, but, they're only used in START statements, which are optional (and on their way to deprecation).
For more detail, have a look here (http://docs.neo4j.org/chunked/stable/graphdb-neo4j-schema.html) and here (http://docs.neo4j.org/chunked/stable/indexing.html).
As for auto-incrementing, I'm unaware of any such functionality for user-defined index keys.
HTH

Create Unique Relationship is taking much amount of time

START names = node(*),
target=node:node_auto_index(target_name="TARGET_1")
MATCH names
WHERE NOT names-[:contains]->()
AND HAS (names.age)
AND (names.qualification =~ ".*(?i)B.TECH.*$"
OR names.qualification =~ ".*(?i)B.E.*$")
CREATE UNIQUE (names)-[r:contains{type:"declared"}]->(target)
RETURN names.name,names,names.qualification
Iam consisting of nearly 1,80,000 names nodes, i had iterated the above process to create unique relationships above 100 times by changing the target. its taking too much amount of time.How can i resolve it..
i build the query with java and iterated.iam using neo4j 2.0.0.5 and java 1.7 .
I edited your cypher query because I think I understand it, but I can barely read the rest of your question. If you edit it with white spaces and punctuation it might be easier to understand what you are trying to do. Until then, here are some thoughts about your query being slow.
You bind all the nodes in the graph, that's typically pretty slow.
You bind all the nodes in the graph twice. First you bind universally in your start clause: names=node(*), and then you bind universally in your match clause: MATCH names, and only then you limit your pattern. I don't quite know what the Cypher engine makes of this (possibly it gets a migraine and goes off to make a pot of coffee). It's unnecessary, you can at least drop the names=node(*) from your start clause. Or drop the match clause, I suppose that could work too, since you don't really do anything there, and you will still need a start clause for as long as you use legacy indexing.
You are using Neo4j 2.x, but you use legacy indexing instead of labels, at least in this query. Without knowing your data and model it's hard to know what the difference would be for performance, but it would certainly make it much easier to write (and read) your queries. So, that's a different kind of slow. It's likely that if you had labels and label indices, the query performance would improve.
So, first try removing one of the universal bindings of nodes, then use the 2.x schema tools to structure your data. You should be able to write queries like
MATCH target:Target
WHERE target.target_name="TARGET_1"
WITH target
MATCH names:Name
WHERE NOT names-[:contains]->()
AND HAS (names.age)
AND (names.qualification =~ ".*(?i)B.TECH.*$"
OR names.qualification =~ ".*(?i)B.E.*$")
CREATE UNIQUE (names)-[r:contains{type:"declared"}]->(target)
RETURN names.name,names,names.qualification
I have no idea if such a query would be fast on your data, however. If you put the "Name" label on all your nodes, then MATCH names:Name will still bind all nodes in the database, so it'll probably still be slow.
P.S. The relationships you create have a TYPE called contains, and you give them a property called type with value declared. Maybe you have a good reason, but that's potentially very confusing.
Edit:
Reading through your question and my answer again I no longer think that I understand even your cypher query. (Why are you returning both the bound nodes and properties of those nodes?) Please consider posting sample data on console.neo4j.org and explain in more detail what your model looks like and what you are trying to do. Let me know if my answer meets your question at all or I'll consider removing it.

Resources