Cypher: Creating a schema index that already exists - neo4j

When you ask Neo4j to create an index that already exists it does not thrown an exception, which seems good for my purpose.
session.run("CREATE INDEX ON :User(email)");
Question 1:
But how is neo4j handling this under the hood? Does it delete the index then recreate it or does it ignore the query altogether since the index already exists?
I'd like to know because I have some CRUD operations and I'd like to define the schema for my nodes at the moment I create them, which means calling "CREATE INDEX" and so fourth. The drawback is this means for every new node created redundant calls to create an index are made, which is fine if they are ignored if the index already exists.
Question 2:
I noticed the following works even if no nodes exist with the label User yet (Meaning I can create labels and index before the associated nodes).
CREATE INDEX ON :User(email)
I'm new to Neo4j and use to the RDBMS world where you can't create an index on a column that does not exist. Am I correct in my belief that creating Indexes and labels before I've even created the nodes that have those labels and properties should not cause any unforeseen problems down the road?

Answering question #2, creating indexes and constraints even when no nodes with those labels or properties exist is perfectly fine, and is in fact an extremely common case. I highly recommend going this route.
I'd recommend against creating indexes with every node creation. While I don't believe it causes much in wasted cycles, it doesn't seem like good design.

Question 1
This won't work if you are creating constraints. They can only be called once and if they exist THEY will indeed throw an exception. Normal indexes don't seem to do so this whole approach should not be attempted.
Question 2
See InverseFalcons comment but it seems to be the correct approach.

Related

Is there a race condition when creating unique paths?

I recently discovered that a race condition exists when executing concurrent MERGE statements. Specifically, duplicate nodes can be created in the scenario where a node is created after the MATCH step but before the CREATE step of a given MERGE.
This can be worked around in some instances using unique constraints on the merged nodes; however, this falls short in scenarios where:
There is no single unique property to enforce (e.g. pairs of properties need to be unique but individual ones don't).
Trying to merge relationships and paths.
Does using CREATE UNIQUE solve this problem (or do the same pitfalls exist)? If so, is it the only option? It feels like the usefulness of MERGE is fairly heavily diminished when it effectively can't guarantee the uniqueness of the path or node being merged...
When MERGE statements are executed concurrently, these situations may occur. Basically, each transaction gets a view of the graph at the first point of reading, and won't see updates made after that point (with some variations). The main exception to this are uniquely constrained nodes, where Neo4j will initialise a fresh reader from the index when reading, regardless of what was previously read in the transaction.
A workaround could be to create a 'dummy' property and a unique constraint on it and one of the node labels. In Neo4j 2.2.5, this should work to get around your problem.

Is node reference equality in embedded neo4j guaranteed?

I am using an embedded graph database as part of a java application. Suppose that I carry out some type of cypher query, and return an ExecutionResult which contains a collection of nodes.
These nodes may be assumed to form a connected graph.
Each of these nodes has some relationships, which I can access using node.getRelationships(Direction.OUTGOING). My question is, if the target of one of these relationships already occurs in the Execution result (i.e. the relationship is part of the query template), is it guaranteed that Relationship.getEndPoint == Node X.
I suppose that what I am really asking is, when a transaction in Neo4j returns a node, does it return just the one object, and different queries will just keep returning references to that one object, or does it keep producing new objects which happen to refer to the same data point? Since Node doesn't override the equalsTo method, I have been assuming the former, but I was hoping someone could tell me.
Nodes are not reference-equals. You'll only get NodeProxy objects which are created on the fly in different operations.
But the equals()-method does id-equality so you should use that.
n1.equals(n2)
or if you keep the node id around use
n1.getId() == n2.getId()
See when you create a node neo4j internally assigns it a node-id. All the relationships you create will have reference to the start node id and end node id.
For checking do this
First create a node and save its node id by calling method node.getId()
Now create a relationship to it from another node. And call your relationship.getEndNode().getId() .
You will see the node-ids are same.
It sounds like your asking - does Neo 'out of the box' give concurrency control of database entities, like n-hibernate or entity framework does for SQL.
The answer is no! You will have to manage it yourself. If you do delelop it though, could make you a few bob

Uniqueness in BatchInserter of Neo4J

I am using a "BatchInserter" to build a graph (in a single thread). I want to make sure nodes (and possibly relationships) are unique. My current solution is to check whether the node exists in the following manner:
String name = (String) nodeProperties.get(IndexKeys.CATEGORY_KEY);
if(index.get(IndexKeys.CATEGORY_KEY, name).size() > 0)
return index.get(IndexKeys.CATEGORY_KEY, name).getSingle();
Long nodeID = inserter.createNode( nodeProperties,categoryLabel );
index.add(nodeID, nodeProperties);
index.flush();
It seems to be working fine but as you can see it is IO expensive (flushing on every new addition - which i believe is a lucene "commit" command). This is slowing down my code considerably.
I am aware of put if absent and uniqueFactory. As documented:
By using put-if-absent functionality, entity uniqueness can be guaranteed using an index.
Here the index acts as the lock and will only lock the smallest part
needed to guaranteed uniqueness across threads and transactions. To
get the more high-level get-or-create functionality make use of
UniqueFactory
However, these are for transaction based interactions with the graph. What I would like to do is to ensure uniqueness of nodes and possibly relationships in a batch insertion semantics, that is faster than my current setup.
Any pointers would be much appreciated.
Thank you
You should investigate the MERGE keyword in cypher. I believe this will permit you to exploit your autoindexes without requiring you to use them yourself. More broadly, you might want to see if you can formulate your bulk load in a way that is conducive to piping large volumes of cypher queries through the neo4j-shell.
Finally, as general pointers and background, you should check out this information on bulk loading
When I encountered this problem, I just decided to go tyrant and force index values in my own. Can't you do the same? I mean, ensure uniqueness before you do the insertions?

Create Unique Relationship is taking much amount of time

START names = node(*),
target=node:node_auto_index(target_name="TARGET_1")
MATCH names
WHERE NOT names-[:contains]->()
AND HAS (names.age)
AND (names.qualification =~ ".*(?i)B.TECH.*$"
OR names.qualification =~ ".*(?i)B.E.*$")
CREATE UNIQUE (names)-[r:contains{type:"declared"}]->(target)
RETURN names.name,names,names.qualification
Iam consisting of nearly 1,80,000 names nodes, i had iterated the above process to create unique relationships above 100 times by changing the target. its taking too much amount of time.How can i resolve it..
i build the query with java and iterated.iam using neo4j 2.0.0.5 and java 1.7 .
I edited your cypher query because I think I understand it, but I can barely read the rest of your question. If you edit it with white spaces and punctuation it might be easier to understand what you are trying to do. Until then, here are some thoughts about your query being slow.
You bind all the nodes in the graph, that's typically pretty slow.
You bind all the nodes in the graph twice. First you bind universally in your start clause: names=node(*), and then you bind universally in your match clause: MATCH names, and only then you limit your pattern. I don't quite know what the Cypher engine makes of this (possibly it gets a migraine and goes off to make a pot of coffee). It's unnecessary, you can at least drop the names=node(*) from your start clause. Or drop the match clause, I suppose that could work too, since you don't really do anything there, and you will still need a start clause for as long as you use legacy indexing.
You are using Neo4j 2.x, but you use legacy indexing instead of labels, at least in this query. Without knowing your data and model it's hard to know what the difference would be for performance, but it would certainly make it much easier to write (and read) your queries. So, that's a different kind of slow. It's likely that if you had labels and label indices, the query performance would improve.
So, first try removing one of the universal bindings of nodes, then use the 2.x schema tools to structure your data. You should be able to write queries like
MATCH target:Target
WHERE target.target_name="TARGET_1"
WITH target
MATCH names:Name
WHERE NOT names-[:contains]->()
AND HAS (names.age)
AND (names.qualification =~ ".*(?i)B.TECH.*$"
OR names.qualification =~ ".*(?i)B.E.*$")
CREATE UNIQUE (names)-[r:contains{type:"declared"}]->(target)
RETURN names.name,names,names.qualification
I have no idea if such a query would be fast on your data, however. If you put the "Name" label on all your nodes, then MATCH names:Name will still bind all nodes in the database, so it'll probably still be slow.
P.S. The relationships you create have a TYPE called contains, and you give them a property called type with value declared. Maybe you have a good reason, but that's potentially very confusing.
Edit:
Reading through your question and my answer again I no longer think that I understand even your cypher query. (Why are you returning both the bound nodes and properties of those nodes?) Please consider posting sample data on console.neo4j.org and explain in more detail what your model looks like and what you are trying to do. Let me know if my answer meets your question at all or I'll consider removing it.

Creating relationship conditionally with cypher (neo4j)

I am attempting to create a linked list with neo4j, and have a root node with no relationships. Here is the pseudo cypher I am trying to create, but I am not sure how, or even if it is possible:
START root=node(1), item=node(2)
MATCH root-[old?:LINK]->last
WHERE old IS NOT NULL
CREATE root-[:LINK]->item-[:LINK]->last
DELETE old
WHERE old IS NULL
CREATE root-[:LINK]->item
Basically I am trying to insert a node into the list if the list exists, and simply create the first list item otherwise. Obviously you cannot do multiple WHEREs like I have done above. Any ideas how I can achieve this desired functionality with a cypher?
The docs solve the problem by first creating a recurrent :LINK relationship on the root node, but I would like to solve this without doing that (as you then need to create possibly unnecessary relationships for each node).
For anyone interested, I figured out a way to solve the above using some WITH tricks. This is essentially a solution for creating linked lists in neo4j without having to first create a self referencing relationship.
START root=node(1), item=node(2)
MATCH root-[old?:LIST_NEXT]->last
CREATE root-[:LIST_NEXT]->item
WITH item, old, last
WHERE old IS NOT NULL
CREATE item-[:LIST_NEXT]->last
DELETE old
This works by first looking for an existing link relationship, and creating the new one from the root to the item. Then by using WITH we can chain the query to now examine whether or not the matched relationship did in fact exist. If it did, then remove it, and create the remaining link piece from the new item to the old one.
For this, you might want to look at MERGE, http://docs.neo4j.org/chunked/snapshot/query-merge.html#merge-merge-with-on-create-and-on-match
And maybe at the linked list example, http://docs.neo4j.org/chunked/snapshot/cookbook-linked-list.html

Resources