How do I provide multiple queries in Neo4j Cypher? - neo4j

I want to use the results from the first query in the second query. I am not sure how to do this in Cypher?
Current code,
START user1=node:USER_INDEX(USER_INDEX = "userA")
MATCH user1-[r1:ACCESSED]->docid1<-[r2:ACCESSED]-user2, user2-[r3:ACCESSED]->docid2
WHERE r2.Topic=r3.Topic
RETURN distinct docid2.Label;
I want to have different conditions checked in the WHERE clause for the same docid2 set of nodes and accumulate the results and perform order by based on a date field.
I am not able to provide multiple match and return within the same transaction.
That is when I am trying to have two different cypher scripts and combine them in a third query. Is this possible in cypher?
Or is there any option to write custom functions and invoke them?
Do we have stored Cypher scripts like Stored Gremlin scripts?

As Michael mentioned in the comment, you can use the "with" statement to stream result into further statements. Unfortunately, you can't start another statement after the "where" clause. Multiple return statements would be kind of illogical, but you can do multiple things in a single query e.g.:
START x=node:node_auto_index(key="x")
with count(x) as exists
start y=node:node_auto_index(key="y")
where exists = 0
create (n {key:"y"})<-[:rel]-y
return n, y
This would check if the "x" node exists and if it doesn't, proceed to create it and add a couple of parameters.
If you wish to do more sophisticated things on result sets, your best options are either batch requests or the Java API...

Related

Neo4j Query Optimization for Cartesian Product

I am trying to implement a user-journey analytics solution. Simply analyze on which screens, which users leave the application.
For this , I have modeled the data like this:
I modeled single activity since I want to index some attributes. Relation attributes can not be indexed in Neo4j.
With this model, I am trying to write a query that follows three successive event types with below query:
MATCH (eventType1:EventType {eventName:'viewStart-home'})<--(event:EventNode)
<--(eventType2:EventType{eventName:'viewStart-payment'})
WITH distinct event.deviceId as eUsers, event.clientCreationDate as eDate
MATCH((eventType2)<--(event2:EventNode)
<--(eventType3:EventType{eventName:'viewStart-screen1'}))
WITH distinct event2.deviceId as e2Users, event2.clientCreationDate as e2Date
RETURN e2Users limit 200000
And the execution plan is below:
I could not figure the reason of this process out. Can you help me?
Your query is doing a lot more work than it needs to.
The first WITH clause is not needed at all, since its generated eUsers and eDate variables are never used. And the second WITH clause does not need to generate the unused e2Date variable.
In addition, you could first add an index for :EventType(eventName) to speed up the processing:
CREATE INDEX ON :EventType(eventName);
With these changes, your query's profile could be simpler and the processing would be faster.
Here is an updated query (that should use the index to quickly find the EventType node at one end of the path, to kick off the query):
MATCH (:EventType {eventName:'viewStart-home'})<--(:EventNode)
<--(:EventType{eventName:'viewStart-payment'})<--(event2:EventNode)
<--(:EventType{eventName:'viewStart-screen1'})
RETURN distinct event2.deviceId as e2Users
LIMIT 200000;
Here is an alternate query that uses 2 USING INDEX hints to tell the planner to quickly find the :EventType nodes at both ends of the path to kick off the query. This might be even faster than the first query:
MATCH (a:EventType {eventName:'viewStart-home'})<--(:EventNode)
<--(:EventType{eventName:'viewStart-payment'})<--(event2:EventNode)
<--(b:EventType{eventName:'viewStart-screen1'})
USING INDEX a:EventType(eventName)
USING INDEX b:EventType(eventName)
RETURN distinct event2.deviceId as e2Users
LIMIT 200000;
Try profiling them both on your DB, and pick the best one or keep tweaking further.

Create relationship with properties using a query in Cypher

I would like to know if this is possible. I have a query that produces a nice report showing a relationship between two entities through two other nodes. There can be more than one path. I now want to create a direct relationship between those two nodes and count the number of paths and sum based upon data in the nodes in between. the report query is below.
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
return bo.AgencyName, count(sol.Number) as awards, so.orgName, sum(prop.finalPrice) as awardVolume;
What I want to do is similar to below which will not work.
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
create (bo)-[:HAS_AWARDED{awardCount: count(sol.Number), awardVolume: sum(prop.finalPrice)}]->(so);
If I remove the properties for the relationship, it works but want to add the properties without to much programing.
I am using the most recent version of Neo4j 3.2.
thanks
The problem here is you are trying to use count() and sum() functions in an invalid context. The below query should work:
match (bo:BuyerAgency)<-[:IS_FOR_BO]-(sol:Solicitation)-[:SELECTED]->(prop:Proposal)<-[:OWNS_BID]-(so:VendorOrg)
where sol.currStatus='Awarded'
with bo, so, count(sol.Number) as count_sol, sum(prop.finalPrice) as sum_finalPrice
create (bo)-[:HAS_AWARDED{awardCount: count_sol, awardVolume: sum_finalPrice}]->(so);
This query uses WITH to pass bo, so and the result of the aggregation functions count(sol.Number) and sum(prop.finalPrice) to the next context. After, these values are used to create the new relation between bo and so.

Clone nodes and relationships with Cypher

Is it possible to clone arbitrary nodes and relationships in a single Cypher neo4j 2.0 query?
'Arbitrary' reads 'without specifying their labels and relationship types'. Something like:
MATCH (node1:NodeType)-[e]->(n)
CREATE (clone: labels(n)) set clone=n set clone.prop=1
CREATE (node1)-[e1:type(e)]->(clone) set e1=e set e1.prop=2
is not valid in Cypher, so one cannot simply get labels from one node or relationship and assign them to another, because labels are compiled into the query literally.
Sure, labels and relation types are important for MATCH and WHERE for producing effective query plan, but isn't CREATE making another case?
The easiest way to clone parts of a graph is to use the dump command in Neo4j shell. dump generates cypher create statements from your return clauses. The result of dump can be appied to the graph database to create clones.
Today, April 2022, I believe the best approach might be using an APOC procedure
I had a similar requirement and this worked for me.
MATCH (rootA:Root{name:'A'}),
(rootB:Root{name:'B'})
MATCH path = (rootA)-[:LINK*]->(node)
WITH rootA, rootB, collect(path) as paths
CALL apoc.refactor.cloneSubgraphFromPaths(paths, {
standinNodes:[[rootA, rootB]]
})
YIELD input, output, error
RETURN input, output, error

Returning multiple nodes in cypher with Index lookup

I have the following cypher query being called multiple times.
start n=node:MyIndex(Name="ABC")
return n
Then somewhere else in the code
start m=node:MyIndex(NAME="XYZ")
return m
My data base is hosted in Azure and so I am having latency/performance issues. In order to speed up the process, and to reduce multiple round trips, I thought about combining multiple Cypher queries into a single one.
Actually, I am getting 10+ nodes in lookup but for simplicity I have decided to show example with just two nodes below.
start n=node:MyIndex(Name="ABC"), m=node:MyIndex(NAME="XYZ")
return n, m
My goal is to get what I can in one round trip instead of 10+. It works successfully if the index lookup on All nodes succeeds. However, Cypher query returns zero rows even if one index lookup fails. I was hoping that I will get NULL equivalent in n or m on the missing node. However, no luck.
Please suggest what I am doing wrong and any workarounds to reduce the round trips. Many thanks!
You can use a parametrized query with lucene syntax, e.g.:
START n=node:MyIndex({query}) return n
and parametrize with
{'query':'Name:(ABC XYZ)'}
where list of names is a string with space separated names you are looking for.

efficiency of where clause in cypher vs match

I'm trying to find 10 posts that were not LIKED by user "mike" using cypher. Will putting a where clause with a NOT relationship be efficient than matching with an optional relationship then checking if that relationship is null in the where clause? Specifically I want to make sure it won't do the equivalent of a full table scan and make sure that this is a scalable query.
Here's what I'm using
START user=node:node_auto_index(uname:"mike"),
posts=node:node_auto_index("postId:*")
WHERE not (user-[:LIKES]->posts)
RETURN posts SKIP 20 LIMIT 10;
Or can I do something where I filter on a MATCH optional relationship
START user=node:node_auto_index(uname="mike"),
posts=node:node_auto_index("postId:*")
MATCH user-[r?:LIKES]->posts
WHERE r IS NULL
RETURN posts SKIP 100 LIMIT 10;
Some quick tests on the console seem to show faster performance in the 2nd approach. Am I right to assume the 2nd query is faster? And, if so why?
i think in the first query the engine runs through all postID nodes and manually checks the condition of not (user-[:LIKES]->posts) for each post ID
whereas in the second example (assuming you use at least v1.9.02) the engine picks up only the post nodes, which actually aren't connected to the user. this is just optimalization where the engine does not go through all postIDs nodes.
if possible, always use the MATCH clause in your queries instead of WHERE, and try to omit the asterix in the declaration START n=node:index('name:*')

Resources