Neo4j cypher queries returning different count result - neo4j

I want a query that starting from a node, it counts the possible end nodes given relation type:
For example this query:
MATCH (start:typeA{my_id:"abc"})-[:rel]->(l:typeB) return count(l)
works great and returns a proper number, i.e., 500. The same happens with:
MATCH p=(start:BusStop{StopCode:"0247"})-[:CAN_BOARD]->(:Leg) return count(p)
However if I do:
MATCH (start:typeA{my_id:"abc"}) return count((start)-[:rel]->(:typeB))
returns 1.
What is the difference between this query and the previous ones?

The result of a path expression (as used in your last query) is a list of paths. This is different than the result when the same path pattern is used in a MATCH clause.
You would have gotten 500 if you changed your last query to use SIZE() instead of COUNT():
MATCH (start:typeA{my_id:"abc"}) return SIZE((start)-[:rel]->(:typeB))

Related

Removing a node from the Cypher query result in Neo4j

I have the following cypher code:
MATCH (n)
WHERE toLower(n.name) STARTS WITH toLower('ja')
RETURN n
This case-insensitive query returns all the nodes which their names start with the substring "ja". For example if I execute this in my db it will return ["Javier", "Jacinto", "Jasper", "Jacob"]
I need this script to also remove the unwanted nodes on this list, for example let's say that an array containing ["Jasper, Javier"] is sent to the data access layer indicating that those two nodes shouldn't be returned, leaving the final query result as follows: ["Jacinto", "Jacob"]
How can I perform this?
If you know before making the query which items should be excluded you can say:
MATCH (n)
WHERE toLower(n.name) STARTS WITH toLower('ja')
AND NOT (toLower(n.name) IN ['jasper', 'javier'])
RETURN n

'Same' queries return different results

I have two queries which are almost the same as below(the only difference is the r: in front of FOR in Query 1)
Query 1: MATCH p=()-[r:FOR]->() RETURN count(p)
Query 2: MATCH p=()-[FOR]->() RETURN count(p)
When I am running this queries against my Neo4j server, it returns different result. Query 1 is around 1/3 or query 2, I guess it is due to query 1 has 'combined' the results while query 2 didn't.(e.g. a-[FOR]->c and b-[FOR]->c were combined into 1 record), but just my guessing. I have tried to google or search in Neo4j documentation but no luck. Anyone can explain the difference?
Thanks in advance.
MATCH p=()-[r:FOR]->() RETURN count(p)
This query binds the FOR relationship to the r variable (though it doesn't use it).
MATCH p=()-[FOR]->() RETURN count(p)
This query binds any relationship (i.e. of any type) to the FOR variable.
The correct syntax for specifying the relationship type in Cypher is :XXX, with the leading colon. The correct version of the second query would actually be:
MATCH p=()-[:FOR]->() RETURN count(p)

Including vars in Neo4j WITH statement changes query output

I'm trying to find the number of nodes of a certain kind in my database that are connected to more than one other node of another kind. In my case, it's place nodes connected to several name nodes. I have a query that works:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p,count(n) as counts
WHERE counts > 1
RETURN p;`
However, that only returns the place nodes, and ideally I'd like it to return all the nodes and edges involved. I've found a question on returning variables from before the WITH, but if I include any of the other variables I've defined, the query returns no responses, i.e. this query returns nothing:
MATCH rels=(p:Place)-[c:Called]->(n:Name)
WITH p, count(n) as counts, rels
WHERE counts > 1
RETURN p;
I don't know how to return the information that I want without changing the results of the query. Any help would be much appreciated
The reason your second query returns nothing is because its WITH clause specifies as aggregation "grouping keys" both p and rels. Since each rels path has only a single n value, counts would always be 1.
Something like this might work for you:
MATCH path=(p:Place)-[:Called]->(:Name)
WITH p, COLLECT(path) as paths
WHERE SIZE(paths) > 1
RETURN p, paths;
This returns each matching Place node and all its paths.
Try this:
MATCH (p:Place)-[c:Called]->(n:Name)
WHERE size((p)-[:Called]->(:Name)) > 1
WITH p,count(n) as counts, collect(n) AS names, collect(c) AS calls
RETURN p, names, calls, counts ORDER BY counts DESC;
This query makes use of Cypher's collect() function to create lists of the names and called relationships for each place that has more than Called relationship with a Name node.

How to query for multiple OR'ed Neo4j paths?

Anyone know of a fast way to query multiple paths in Neo4j ?
Lets say I have movie nodes that can have a type that I want to match (this is psuedo-code)
MATCH
(m:Movie)<-[:TYPE]-(g:Genre { name:'action' })
OR
(m:Movie)<-[:TYPE]-(x:Genre)<-[:G_TYPE*1..3]-(g:Genre { name:'action' })
(m)-[:SUBGENRE]->(sg:SubGenre {name: 'comedy'})
OR
(m)-[:SUBGENRE]->(x)<-[:SUB_TYPE*1..3]-(sg:SubGenre {name: 'comedy'})
The problem is, the first "m:Movie" nodes to be matched must match one of the paths specified, and the second SubGenre is depenedent on the first match.
I can make a query that works using MATCH and WHERE, but its really slow (30 seconds with a small 20MB dataset).
The problem is, I don't know how to OR match in Neo4j with other OR matches hanging off of the first results.
If I use WHERE, then I have to declare all the nodes used in any of the statements, in the initial MATCH which makes the query slow (since you cannot introduce new nodes in a WHERE)
Anyone know an elegant way to solve this ?? Thanks !
You can try a variable length path with a minimal length of 0:
MATCH
(m:Movie)<-[:TYPE|:SUBGENRE*0..4]-(g)
WHERE g:Genre and g.name = 'action' OR g:SubGenre and g.name='comedy'
For the query to use an index to find your genre / subgenre I recommend a UNION query though.
MATCH
(m:Movie)<-[:TYPE*0..4]-(g:Genre { name:'action' })
RETURN distinct m
UNION
(m:Movie)-[:SUBGENRE]->(x)<-[:SUB_TYPE*1..3]-(sg:SubGenre {name: 'comedy'})
RETURN distinct m
Perhaps the OPTIONAL MATCH clause might help here. OPTIONAL MATCH beavior is similar to the MATCH statement, except that instead of an all-or-none pattern matching approach, any elements of the pattern that do not match the pattern specific in the statement are bound to null.
For example, to match on a movie, its genre and a possible sub-genre:
OPTIONAL MATCH (m:Movie)-[:IS_GENRE]->(g:Genre)<-[:IS_SUBGENRE]-(sub:Genre)
WHERE m.title = "The Matrix"
RETURN m, g, sub
This will return the movie node, the genre node and if it exists, the sub-genre. If there is no sub-genre then it will return null for sub. You can use variable length paths as you have above as well with OPTIONAL MATCH.
[EDITED]
The following MATCH clause should be equivalent to your pseudocode. There is also a USING INDEX clause that assumes you have first created an index on :SubGenre(name), for efficiency. (You could use an index on :Genre(name) instead, if Genre nodes are more numerous than SubGenre nodes.)
MATCH
(m:Movie)<-[:TYPE*0..4]-(g:Genre { name:'action' }),
(m)-[:SUBGENRE]->()<-[:SUB_TYPE*0..3]-(sg:SubGenre { name: 'comedy' })
USING INDEX sg:SubGenre(name)
Here is a console that shows the results for some sample data.

Multiple match with where clause in Neo4j Cypher gives error "Cannot match on a pattern containing only already bound identifiers"

We are using neo4j-community-2.1.2. Right now we have only 3 nodes Of Job label in the database And we do Schema indexing on all fields that are used in this query . Total DB hits approx 40
Query is ->
PROFILE match (job1:Job) where (job1.jobType="Adhoc" or job1.jobType="Virtual") AND (job1.mode="Free" or job1.mode="Paid") with collect(job1) as jobs1
match (job2:Job)-[REQUIRED_SKILL]-(skill:Skill) where skill.name="Neo4j" and (job2 in jobs1) with collect(job2) as jobs2
match (job3:Job)-[REQUIRED_SKILL]-(skill:Skill) where skill.name="Java" and (job3 IN jobs2) with collect(job3) as jobs3 return jobs3
So we try to do something like that
match (job1:Job) where (job1.jobType="Adhoc" or job1.jobType="Virtual")
match (job1) where (job1.mode="Free" or job1.mode="Paid") with collect(job1) as jobs1 return jobs1
Because result of first match goes to next match . So that in next filter there is only need to filter less number of nodes But we get this exception
Cannot match on a pattern containing only already bound identifiers (line 2, column 1)
"match (job1) where (job1.mode="Free" or job1.mode="Paid") with collect(job1) as jobs1 return jobs1"
Optimize this Query
You cannot match job1 twice, once it is matched you can use the same instance again (using WITH), or in this case, you can filter on both conditions using AND. Also your query would be simpler by replacing OR with IN inclusion test, like this:
match (job1:Job)
where job1.jobType in ["Adhoc", "Virtual"]
and job1.mode in ["Free", "Paid"]
return collect(job1) as jobs1

Resources