I'm trying to read in a row of data (not from csv but passed as parameters) by unwinding and then merging. It seems that the match part of the query under the unwind is taking a really long time (whereas a simple create is more or less instant). I'm a bit confused because the the match should be fairly quick to run since it can index on the label first.
Here's a minimal version of my query (way more data will be input than just an id in real life):
WITH *, timestamp() as time0
WITH * UNWIND range(0, 9000) as unwound_data0
MERGE (node0:Node {id: unwound_data0}) ON CREATE SET node0.dbcreated = time0
WITH time0, collect(node0) as nodes0
RETURN time0
If I simplify it to
UNWIND range(0, 9000) as unwound_data0
MATCH (node0: Node)
RETURN node0
It takes just as long. But if I change match to create, then its very fast.
Any ideas on how to speed this up?
Thought I would provide a slightly more detailed answer for anyone landing here. To speed this up you may need to create an index for the id property you are attempting to match on.
You can create an index in Cypher with the below command:
CREATE INDEX index_name
FOR (a:LabelName)
ON (a.propertyName)
Set index_name, LabelName and propertyName to the values relevant to you.
If the property you are merging/matching on is unique then you can instead create a constraint in cypher with the below command:
CREATE CONSTRAINT constraint_name
ON (a:LabelName)
ASSERT a.id_property IS UNIQUE
Creating a constraint on the property of interest creates an index that will be used during lookup operations. It also ensures that a property is unique, throwing an error if you try to create a node with the same property value.
Related
I have the following Neo4J DB model:
I need to be able to find a node of the type Concept called "idea" made by the user infranodus and see which nodes of the type Context it is connected to.
I use the following query, but it's taking too long and doesn't even load...
MATCH (ctx:Context)-[:BY]->(u:User{name:'infranodus'})
WITH DISTINCT ctx
MATCH (c:Concept{name:'idea'})-[:AT]->(ctx)<-[:AT]-(cn:Concept)
RETURN DISINCT c, ctx, count(cn);
How to fix it?
adding indexes to the neo4j database would be helpful ( i didnt see any there ) then start your match with the indexed node and traverse from there. i think that having a
MATCH (c:Concept{name:'idea'})-[:AT]->(ctx)<-[:AT]-(cn:Concept)
WHERE (ctx)-[:BY]->(u:User{name:'infranodus'})
....
would also avoid querying twice.
Create an index on the name property of nodes with the User label - a unique index would be great for this.
An index for the name property on concept would be good, too (probably without uniqueness constraint - even though you're looking for distinct nodes explicitly).
Start your query from there.
MATCH (u:User {name:'infranodus'})
MATCH (c:Concept {name: 'idea'})
MATCH (u)<-[:BY]-(ctx:Context)<-[:AT]-(c)
MATCH (c:Concept{name:'idea'})-[:AT]->(ctx)<-[:AT]-(cn:Concept)
RETURN c, ctx, size((c)-[:AT]->(ctx)<-[:AT]-(:Concept))
Doesn't look as nice, but you're asking for query tuning. Make sure to PROFILE your existing query and this. Plus, you had a typo in your query after your return keyword.
I have a couple of nodes and I need to add a incremental value in all nodes of a particular label.
match wd= (w:MYNODE)
forEach(n IN nodes(wd)|
set n.incrementalId=??
);
currently I have tried for each to traverse on each node but unable to get index of the each loop
ps: I have also tried with but unable to increment ie ++ the current value.
This is a more succinct version of #InverseFalcon's first query (and resembles the form of the query in your question):
MATCH (w:MYNODE)
WITH COLLECT(w) as ws
FOREACH(i IN RANGE(0, SIZE(ws)-1) | SET (ws[i]).incrementalId = i);
If the nodes already exist, then you'll have to use a more awkward approach using collections and indexes:
MATCH (w:MYNODE)
WITH collect(w) as myNodes
UNWIND range(0, size(myNodes)-1) as index
WITH index, myNodes[index] as node
SET node.incrementalId = index
If the nodes don't already exist, and you want to create some number of them with incremental ids, then it's an easier task using the range() function to produce a list of indexes:
UNWIND range(1, 100) as id
CREATE (:MYNODE {id:id})
Lot of years ago, I discussed with some neo4j engineers about the ability to query an unknown object given it's uuid.
At that time, the answer was that there was no general db index in neo4j.
Now, I have the same problem to solve:
each node I create has an unique id (uuid in the form <nx:u-<uuid>-?v=n> where ns is the namespace, uuid is a unique uuid and v=n is the version number of the element.
I'd like to have the ability to run the following cypher query:
match (n) where n.about = 'ki:u-SSD-v5.0?v=2' return n;
which actually return nothing.
The following query
match (n:'mm:ontology') where n.about = 'ki:u-SSD-v5.0?v=2' return n;
returns what I need, despite the fact that at query time I don't know the element type.
Can anyone help on this?
Paolo
Have you considered adding a achema index to every node in the database for the about attribute?
For instance
Add a global label to all nodes in the graph (e.g. Node) that do not already have it. If your graph is overly large and/or heap overly small you will need to batch this operation. Something along the lines of the following...
MATCH (n)
WHERE NOT n:Node
WITH n
LIMIT 100000
SET n:Node
After the label is added then create an index on the about attribute for your new global label (e.g. Node). These steps can be performed interchangeably as well.
CREATE CONSTRAINT ON (node:Node) assert node.about IS UNIQUE
Then querying with something like the following
MATCH (n:Node)
WHERE n.about = 'ki:u-SSD-v5.0?v=2'
RETURN n;
will return the node you are seeking in a performant manner.
I have a Neo4J DB up and running with currently 2 Labels: Company and Person.
Each Company Node has a Property called old_id.
Each Person Node has a Property called company.
Now I want to establish a relation between each Company and each Person where old_id and company share the same value.
Already tried suggestions from: Find Nodes with the same properties in Neo4J and
Find Nodes with the same properties in Neo4J
following the first link I tried:
MATCH (p:Person)
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
resulting in no change at all and as suggested by the second link I tried:
START
p=node(*), c=node(*)
WHERE
HAS(p.company) AND HAS(c.old_id) AND p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN p, c;
resulting in a runtime >36 hours. Now I had to abort the command without knowing if it would eventually have worked. Therefor I'd like to ask if its theoretically correct and I'm just impatient (the dataset is quite big tbh). Or if theres a more efficient way in doing it.
This simple console shows that your original query works as expected, assuming:
Your stated data model is correct
Your data actually has Person and Company nodes with matching company and old_id values, respectively.
Note that, in order to match, the values must be of the same type (e.g., both are strings, or both are integers, etc.).
So, check that #1 and #2 are true.
Depending on the size of your dataset you want to page it
create constraint on (c:Company) assert c.old_id is unique;
MATCH (p:Person)
WITH p SKIP 100000 LIMIT 100000
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN count(*);
Just increase the skip value from zero to your total number of people in 100k steps.
Neo4j: Finding simple path between two nodes takes a lot of time even after using upper limit (*1..4). I don't want to use allShortestPath or shortestPath because it doesnt return all the paths.
Match p=((n {Name:"Node1"}) -[*1..4]-> (m {Name:"Node2"})) return p;
Any suggestions to make it faster ?
If you have a lot of nodes, try creating an index so that the neo4j DB engine does not have to search through every node to find the ones with the right Name property values.
I am presuming that, in your example, the n and m nodes are really the same "type" of node. If this is true, then:
Add a label (I'll call it 'X') to every node (of the same type as n and m). You can use the following to add the 'X' label to node(s) represented by the variable n. You'd want to precede it with the appropriate MATCH clause:
SET n:X
Create an index on the Name property of nodes with the X label like this:
CREATE INDEX ON :X(Name);
Modify your query to:
MATCH p=((n:X {Name:"Node1"}) -[*1..4]-> (m:X {Name:"Node2"}))
RETURN p;
If you do the above, then your query should be faster.