I have a relatively large set of nodes, and I want to find all pairs of nodes that have matching property values, but I don't know or care in advance what the property value is. This is basically an attempt to find duplicate nodes, but I can limit the definition of a duplicate to two or more nodes that have the same property value.
Any ideas how to proceed? Not finding any starting points in the neo4j docs. I'm on 1.8.2 community edition.
EDIT
Sorry for not being clear in the initial question, but I'm talking about doing this through Cypher.
Cypher to count values on a property, returning a collection of nodes as well:
start n=node(*)
where has(n.prop)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;
Example on console: http://console.neo4j.org/r/k2s7aa
You can also do an index scan with the property like so (to avoid looking at nodes that don't have this property):
start n=node:node_auto_index('prop:*') ...
2.0 Cypher with a label Label:
match (n:Label)
with n.prop as prop, collect(n) as nodelist, count(*) as count
where count > 1
return prop, nodelist, count;
Update for 3.x: has was replaced by exists.
You can try this one who does which I think does whatever you want.
START n=node(*), m=node(*)
WHERE
HAS(n.name) AND HAS (m.name) AND
n.name=m.name AND
ID(n) <ID(m)
RETURN n, m
http://console.neo4j.org/?id=xe6wmt
Both nodes should have a name property. name should be equal for both nodes and we only want one pair of the two possibilites which we get via the id comparison. Not sure about performance - please test.
What about the following approach:
use getAllNodes to get an Iterable over all nodes.
using getPropertyKeys and getProperty(key) build up a java.util.Map containing all properties for a node. Calculate the map's hashCode()
build up a global Map using the hashCode as key and a set of node.getId() as values
This should give you the candidates for being duplicate. Be aware of the hashCode() semantics, there might be nodes with different properties mapping to the same hashCode.
Neo4j 3.1.1
HAS is no longer supported in Cypher, please use EXISTS instead.
If you want to find nodes with specific property, the Cyper is as follows:
MATCH (n:NodeLabel) where has(n.NodeProperty) return n
With Neo4j 3.3.4 you can simply do the following:
MATCH (n) where EXISTS(n.propertyName) return n
Simply change propertyName to whatever property you are looking to find.
The best/easiest option is to do something like a local Map. If you did something like this, you could create code like this:
GlobalGraphOperations ggo = GlobalGraphOperations.at(db);
Map<Object, Node> duplicateMap = new HashMap<Object, Node>();
for (Node node : ggo.getAllNodes()) {
Object propertyValue = node.getProperty("property");
Node existingNode = duplicateMap.get(propertyValue);
if (existingNode == null) {
duplicateMap.put(propertyValue, node);
} else {
System.out.println("Duplicate Node. First Node: " + existingNode + ", Second Node: " + node);
}
}
This would print out a list. If you needed to do more, like remove these nodes, you could do something in the else.
Do you know the property name? Will this be multiple properties, or just duplicates of a single name/value pair? If you are doing multiple properties, just create a map for each property you have.
You can also use an index on that property. Then for a given value retrieve all the nodes. The advantage is that you can also query for approximations of the value.
Related
I have a node with 8 properties and another node with only property under a common label. How can i match / query / display the nodes which has only property.
In other words, how can i match nodes which doesn't have more than 1 property.
You may want something like:
MATCH (n)
WHERE size(keys(n)) = 1
RETURN n
However note that this is a graph-wide query, and likely to be expensive. Confining the query to a label may help a little.
As Michael Hunger mentioned
MATCH (n) WHERE NOT EXISTS(n.foo) RETURN n
Duplicate of the question : Find neo4j nodes where property is not set
I'm sure this is an easy cypher query, but I'm relatively new to cypher, so apologies ahead of time, but I can't find a previously asked question.
If I have a bunch of nodes connected like this:
(:Start)-[:NEXT]->(step1)-[:NEXT]->(step2)-[:NEXT]->(step3)-[:NEXT]->etc.
And I want to return all the nodes in this group, I can write this:
match (s:Start)-[:NEXT*]->(steps)
return s, steps
But what if I want to order them by their distance from the starting node? Is there a characteristic I an apply order by to or is it more complicated than that?
Thanks
You can enforce the ordering by introducing a variable on the collection of :NEXT relationships, and ordering by their size (how many :NEXTs to get to the node).
MATCH (s:Start)-[rels:NEXT*]->(steps)
RETURN s, steps
ORDER BY SIZE(rels)
Nodes of paths are returned in their sequenced order, so you can use the nodes collection as starting point :
MATCH (s:Start)-[rels:NEXT*]->(steps)
UNWIND range(1, size(nodes(p))-1) AS i
RETURN nodes(p)[i] as node, i
ORDER BY i
Example of this query against the console example : http://console.neo4j.org/r/7nzgov
In each of my nodes I have a selection of properties like education_id , work_id, locale etc etc. All these properties can have one or more than one values of the sort of education_id:112 or education_id:165 i.e. Node A might have education_id:112 and Node B might have education_id:165 and again Node C might have education_id:112 and so on.
I want a cypher query that return all nodes for a particular value of the property and I don't care about the value of the property beforehand.
To put it into perspective, in the example I have provided, it must return Node A and Node C under education_id:112 and Node B under education_id:165
Note: I am not providing multiple cypher queries specifying the values of properties each time. The whole output must be dynamic.
The output to the query should be something like
education_id:112 Node A, Node C
education_id:165 Node B
These are the results of a single query statement.
Not quite sure I understand your question, but based on the expected output:
MATCH (n) RETURN n.education_id,collect(n)
will group nodes by distinct values of education_id
You should probably take a look at the cypher refcard. What you are looking for is the WHEREclause:
Match (a) WHERE a.education_id = 112 return a
You can also specify property directly in the MATCH clause.
Match (a{education_id: 112}) RETURN a
I have a problem in querying for two nodes with one similar property, one different property and having the same label. For one property, the property is the same between both nodes, i.e. both have a property called "Name" and both have the same value ("Data Storage"). For the other property though called "Note", they have different values. Both share the same label called "Issue". When I use the query below, I get both nodes.
match (n:Issue) where n.name="Data Storage" return n;
However, when I query with the following query...
match (n:Issue) where n.name="Data Storage" and n.note="xxxx" return n;
...it only works for one of the nodes and not for the other. I've tried creating the node which doesn't query and it seems to work fine. But I also did it with a different label. Is this some bug around not being able to query a node having the same label and sharing at least one common property?
match (n:Issue) where n.name="Data Storage" and n.note="xxxx" return n;
will match all nodes with label Issue, the value of the "name" property = "Data Storage" AND the value of the "note" property = "xxxx".
As you've described the 2 nodes, the value of the note property is different on each. The one matching xxxx is the only one that can be returned.
What is the goal of this query?
If this was your real question.
Query a node having the same label and sharing at least one common property?
you can try this for each property and do a union over all
MATCH (n:Issue)
WITH n.name, collect(n) as nodes, count(*) as cnt
WHERE cnt > 1
RETURN nodes
or this will be less performant
MATCH (n:Issue)
WITH n
MATCH (m:Issue)
WHERE m<>n AND (n.name = m.name OR n.note = m.note)
RETURN n,m
Neo4j: Finding simple path between two nodes takes a lot of time even after using upper limit (*1..4). I don't want to use allShortestPath or shortestPath because it doesnt return all the paths.
Match p=((n {Name:"Node1"}) -[*1..4]-> (m {Name:"Node2"})) return p;
Any suggestions to make it faster ?
If you have a lot of nodes, try creating an index so that the neo4j DB engine does not have to search through every node to find the ones with the right Name property values.
I am presuming that, in your example, the n and m nodes are really the same "type" of node. If this is true, then:
Add a label (I'll call it 'X') to every node (of the same type as n and m). You can use the following to add the 'X' label to node(s) represented by the variable n. You'd want to precede it with the appropriate MATCH clause:
SET n:X
Create an index on the Name property of nodes with the X label like this:
CREATE INDEX ON :X(Name);
Modify your query to:
MATCH p=((n:X {Name:"Node1"}) -[*1..4]-> (m:X {Name:"Node2"}))
RETURN p;
If you do the above, then your query should be faster.