Lot of years ago, I discussed with some neo4j engineers about the ability to query an unknown object given it's uuid.
At that time, the answer was that there was no general db index in neo4j.
Now, I have the same problem to solve:
each node I create has an unique id (uuid in the form <nx:u-<uuid>-?v=n> where ns is the namespace, uuid is a unique uuid and v=n is the version number of the element.
I'd like to have the ability to run the following cypher query:
match (n) where n.about = 'ki:u-SSD-v5.0?v=2' return n;
which actually return nothing.
The following query
match (n:'mm:ontology') where n.about = 'ki:u-SSD-v5.0?v=2' return n;
returns what I need, despite the fact that at query time I don't know the element type.
Can anyone help on this?
Paolo
Have you considered adding a achema index to every node in the database for the about attribute?
For instance
Add a global label to all nodes in the graph (e.g. Node) that do not already have it. If your graph is overly large and/or heap overly small you will need to batch this operation. Something along the lines of the following...
MATCH (n)
WHERE NOT n:Node
WITH n
LIMIT 100000
SET n:Node
After the label is added then create an index on the about attribute for your new global label (e.g. Node). These steps can be performed interchangeably as well.
CREATE CONSTRAINT ON (node:Node) assert node.about IS UNIQUE
Then querying with something like the following
MATCH (n:Node)
WHERE n.about = 'ki:u-SSD-v5.0?v=2'
RETURN n;
will return the node you are seeking in a performant manner.
Related
I'm trying to read in a row of data (not from csv but passed as parameters) by unwinding and then merging. It seems that the match part of the query under the unwind is taking a really long time (whereas a simple create is more or less instant). I'm a bit confused because the the match should be fairly quick to run since it can index on the label first.
Here's a minimal version of my query (way more data will be input than just an id in real life):
WITH *, timestamp() as time0
WITH * UNWIND range(0, 9000) as unwound_data0
MERGE (node0:Node {id: unwound_data0}) ON CREATE SET node0.dbcreated = time0
WITH time0, collect(node0) as nodes0
RETURN time0
If I simplify it to
UNWIND range(0, 9000) as unwound_data0
MATCH (node0: Node)
RETURN node0
It takes just as long. But if I change match to create, then its very fast.
Any ideas on how to speed this up?
Thought I would provide a slightly more detailed answer for anyone landing here. To speed this up you may need to create an index for the id property you are attempting to match on.
You can create an index in Cypher with the below command:
CREATE INDEX index_name
FOR (a:LabelName)
ON (a.propertyName)
Set index_name, LabelName and propertyName to the values relevant to you.
If the property you are merging/matching on is unique then you can instead create a constraint in cypher with the below command:
CREATE CONSTRAINT constraint_name
ON (a:LabelName)
ASSERT a.id_property IS UNIQUE
Creating a constraint on the property of interest creates an index that will be used during lookup operations. It also ensures that a property is unique, throwing an error if you try to create a node with the same property value.
In my Neo4j database, there are many nodes with the same nodeID and different levels, and they are connected through a path. Each time I'm trying to find the node that has the biggest level which is smaller than a specific level n. I use the following Cypher query, which starts searching from the most current node with the nodeID id.
MATCH (:Node{NodeID:id,Current:'true'})-[:type*0..]->(m:Node{NodeID:id})
WHERE m.Level < n
RETURN m
ORDER BY m.Level DESC
LIMIT 1
And the index I create for this database is as following:
CREATE INDEX Nodes FOR(n:Node) ON (n.NodeID, n.Level)
However, it's kind of slow especially when the path is long and I need to repeat this process thousands of times. So my question is, is there any better way of implementation and do I need to modify my index to improve the performance? Thanks in advance for your help!
Assuming all Nodes with the same NodeID are in a type path rooted at the Current node with the same NodeID, then the following query should be logically equivalent but faster:
MATCH (m:Node)
WHERE m.NodeID = $id AND m.Level < $n
RETURN m
ORDER BY m.Level DESC
LIMIT 1
This query assumes id and n are query parameters.
I'm wondering, when I have read the data of a node and I want to match it in another query, which way will have the best performance? Using id like this:
MATCH (n) WHERE ID(n) = 1234
or using indices of the node:
MATCH (n:Label {SomeIndexProperty: 3456})
Which one is better?
IDs are a technical ID for Neo4j, and those should not be used as a primary key for your application.
Every node (and relationship) has a technical ID, and it's stable over time.
But if you delete a node, for example the node 32, Neo4j will reuse this ID for a new node.
So you can use it in your queries inside the same transaction (there is no problem), otherwise you should know what you are doing.
The only way to retrieve the technical ID, is to use the function ID like you do on your first query : MATCH (n) WHERE ID(n) = 1234 RETURN n.
The ID is not exposed as a node's property, so you can't do MATCH (n {ID:1234}) RETURN n.
You have noticed that if you want to do a WHERE on a strict equality, you can do put the condition directly on the node.
For example :
MATCH (n:Node) WHERE n.name = 'logisima' RETURN n
MATCH (n:Node {name:'logisima'}) RETURN n
Those two queries are identicals, they generate the same query plan, it's just a syntactic sugar.
Is it faster to retrieve a node by its ID or by an indexed property ?
The easier way to know the answer to this question is to profile the two queries.
On the one based on the ID, you will see the box NodeByIdSeek that cost 1 db hit, and on the one with a unique constrainst you will see the box NodeUniqueIndexSeek with 2 db hits.
So searching a node by its ID is faster.
I am looking for help to optimize my following cypher query.
CALL algo.unionFind.stream()
YIELD nodeId,setId
MATCH(n) where ID(n) = nodeId AND NOT (n)-[:IS_CHILD_OF]-()
call apoc.create.uuids(1) YIELD uuid
WITH n as nod, uuid, setId
WHERE nod is not null
MERGE(groupid:GroupId {group_id:'id_'+toString(setId)})
ON CREATE set groupid.group_value = uuid, groupid.updated_at = '1512135348335'
MERGE(nod)-[:IS_CHILD_OF]->(groupid)
RETURN count(nod);
I have already applied the unique constraints and index over group_id. Even I am using a good configurations machine i3-2xl.
The above query is taking too long time ~22 minutes for ~500k nodes.
Following are the things I want to achieve from the above query.
Get all the connected components(sub-graph).
Create a new node for each group(connected components).
Assign uuid as a value of the new group node.
Build the relationship with all the group members with the new group node.
Any suggestions are welcome to optimize the above query, or please let me know if is there any other way to achieve my listed requirement.
I have a Neo4J DB up and running with currently 2 Labels: Company and Person.
Each Company Node has a Property called old_id.
Each Person Node has a Property called company.
Now I want to establish a relation between each Company and each Person where old_id and company share the same value.
Already tried suggestions from: Find Nodes with the same properties in Neo4J and
Find Nodes with the same properties in Neo4J
following the first link I tried:
MATCH (p:Person)
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
resulting in no change at all and as suggested by the second link I tried:
START
p=node(*), c=node(*)
WHERE
HAS(p.company) AND HAS(c.old_id) AND p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN p, c;
resulting in a runtime >36 hours. Now I had to abort the command without knowing if it would eventually have worked. Therefor I'd like to ask if its theoretically correct and I'm just impatient (the dataset is quite big tbh). Or if theres a more efficient way in doing it.
This simple console shows that your original query works as expected, assuming:
Your stated data model is correct
Your data actually has Person and Company nodes with matching company and old_id values, respectively.
Note that, in order to match, the values must be of the same type (e.g., both are strings, or both are integers, etc.).
So, check that #1 and #2 are true.
Depending on the size of your dataset you want to page it
create constraint on (c:Company) assert c.old_id is unique;
MATCH (p:Person)
WITH p SKIP 100000 LIMIT 100000
MATCH (c:Company) WHERE p.company = c.old_id
CREATE (p)-[:BELONGS_TO]->(c)
RETURN count(*);
Just increase the skip value from zero to your total number of people in 100k steps.