Post processing on UNION result cypher query neo4j - neo4j

I have written below query using lucene indexing . I know UNION doesn't have pagination support in neo4j . I want to post process the
final result of this query to paginate the result .
START user=node:peopleSearch('username:abhay*') RETURN user , user.createdDate as time UNION START rel= Relationship:peopleSearch('skillName:Java23') RETURN StartNode(rel) as user , StartNode(rel).createdDate as time
I have followed this link . But it giving me error on UNION RESULT .
I need to post process the union final result to paginate it

The referenced github issue is not yet resolved, so post processing of UNION is not yet possible.
As a workaround in the meantime you can split up your query into two queries. The first one combines the index lookups and returns node ids:
START user=node:peopleSearch('username:abhay*') RETURN id(user) as id
UNION
START rel= Relationship:peopleSearch('skillName:Java23') RETURN id(StartNode(rel)) as id
On client side, collect the returned IDs into an array and submit the second query using a cypher parameter:
MATCH (n)
WHERE ID(n) IN {myIdArray}
WITH n
// placeholder for rest of this query
where the parameter value of myIdArray is the returned IDs from step1.
I had a similar question a few weeks ago, and it turns out Cypher currently does not support post-UNION processing.
This means that you should filter both inputs of the union using the same condition.
Alternatively (as mentioned in this answer), if you can use APOC, you can post-process the results of a query:
CALL apoc.cypher.run("... UNION ...", NULL) YIELD n, r, x
WHERE ...
RETURN n, r, x;
Update: this is now possible in Neo4j 4.0 using the CALL {subquery} construct.

Related

WHERE condition in neo4j | Filtering by relationship property

How does the where condition in neo4j works ?
I have simple data set with following relationship =>
Client -[CONTAINS {created:"yesterday or today"}]-> Transaction -[INCLUDES]-> Item
I would like to filter above to get the items for a transaction which were created yesterday, and I use the following query -
Match
(c:Client) -[r:CONTAINS]-> (t:Transaction),
(t) -[:INCLUDES]-> (i:Item)
where r.created="yesterday"
return c,t,i
But it still returns the dataset without filtering. What is wrong ? And how does the filtering works in neo4j for multiple MATCH statements say when I want to run my query on filetered dataset from previous steps?
Thank you very much in advance.
Your query seems fine to me. However, there are 2 things I would like to point out here:
In this case, the WHERE clause can be removed and use match by property instead.
The MATCH clause can be combined.
So, the query would be:
MATCH (c:Client) -[r:CONTAINS {created: "yesterday"}]-> (t:Transaction) -[:INCLUDES]-> (i:Item)
RETURN c, t, i
Regarding your second question, when you want to run another query on the filtered dataset from the previous step, use WITH command. Instead of returning the result, WITH will pipe your result to the next query.
For example, with your query, we can do something like this to order the result by client name and return only the client:
MATCH (c:Client) -[r:CONTAINS {created: "yesterday"}]-> (t:Transaction) -[:INCLUDES]-> (i:Item)
WITH c, t, i
ODERBY c.name DESC
RETURN c
There does not seem to be anything wrong with the cypher statement.
Applying subsequent MATCH statements can be done with the WITH clause, it's well documented here : https://neo4j.com/docs/cypher-manual/current/clauses/with/

Neo4j count Query

match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.

How do I provide a cypher query to neography ruby gem to search by relationships and date property?

I am looking at the following:
https://github.com/maxdemarzi/neography/wiki/Scripts-and-queries
I have tried to come up with the value for "query" which will return the following:
nodes that have a relationship workingOn
created Date is yesterday (I used an integer of epoch time since it didn't seem like there was a Date type?)
return the property value
I tried:
start n=node(id) # where id is the reference node
match n-[:workingOn]-()
where has(n.date < Date.now.to_i and n.date > Yesterday.to_i) # yesterday is a Date for yesterday
return n
Solved:
I got the insight from the question I marked as having solved it, but what I did was create a query string and used interpolation to populate it with the values needed. E.g. query = "Match (n) -[#{relationship}]-(n2)....etc
You are using antiquated Cypher syntax. If you are using a recent version of neo4j (3.1+) and have the appropriate APOC plugin installed, the following should work. (I assume that the id parameter is passed when making the query. If it isn't, replace $id with the actual ID value.)
WITH timestamp() AS now
MATCH (n)
WHERE ID(n) = $id AND
(n)-[:workingOn]-() AND
apoc.date.convert(apoc.date.convert(now, 'ms', 'd') - 1, 'd', 'ms') < n.date < now
RETURN n;
It uses the timestamp() function to get the current epoch time (in milliseconds), and uses the APOC function apoc.date.convert twice to get the epoch time for the start of yesterday.
It also moves the (n)-[:workingOn]-() pattern to the WHERE clause so that for each n only a single row is generated, even when that n has multiple workingOn relationships.
(The RETURN clause would actually be RETURN n.value if you wanted to return the value property of the n node.)
[UPDATE]
GrapheneDB supports plugins like APOC. See their documentation.
To search by a native neo4j ID, you need to know the ID value first. You may need to perform another query first to get it. Note, however, that it may be better for you to assign and use your own IDs instead of the native IDs, since the latter can be recycled and used for new nodes if the original is deleted.

Neo4J - How to do post processing UNION (pagination)

I'm writing a cypher query to load data from my Neo4J DB, this is my data model
So basically what I want is a query to return a Journal with all of its properties and everything related to it, Ive tried doing the simple query but it is not performant at all and my ec2 instance where the DB is hosted runs out of memory quickly
MATCH p=(j:Journal)-[*0..]-(n) RETURN p
I managed to write a query using UNIONS
`MATCH p=(j:Journal)<-[:BELONGS_TO]-(at:ArticleType) RETURN p
UNION
MATCH p=(j:Journal)<-[:OWNS]-(jo:JournalOwner) RETURN p
UNION
MATCH p=(j:Journal)<-[:BELONGS_TO]-(s:Section) RETURN p
UNION

MATCH p=(j:Journal)-[:ACCEPTS]->(fc:FileCategory) RETURN p
UNION
MATCH p=(j:Journal)-[:CHARGED_BY]->(a:APC) RETURN p
UNION
MATCH p=(j:Journal)-[:ACCEPTS]->(sft:SupportedFileType) RETURN p
UNION
MATCH p=(j:Journal)<-[:BELONGS_TO|:CHILD_OF*..]-(c:Classification) RETURN p
SKIP 0 LIMIT 100`
The query works fine and its performance is not bad at all, the only problem I'm finding is in the limit, I've been googling around and I've seen that post-processing queries with UNIONS is not yet supported.
The referenced github issue is not yet resolved, so post processing of UNION is not yet possible github link
Logically the first thing I tried when I came across this issue was to put the pagination on each individual query, but this had some weird behaviour that didn't make much sense to myself.
So I tried to write the query without using UNIONS, I came up with this
`MATCH (j:Journal)
WITH j LIMIT 10
MATCH pa=(j)<-[:BELONGS_TO]-(a:ArticleType)
MATCH po=(j)<-[:OWNS]-(o:JournalOwner)
MATCH ps=(j)<-[:BELONGS_TO]-(s:Section)
MATCH pf=(j)-[:ACCEPTS]->(f:FileCategory)
MATCH pc=(j)-[:CHARGED_BY]->(apc:APC)
MATCH pt=(j)-[:ACCEPTS]->(sft:SupportedFileType)
MATCH pl=(j)<-[:BELONGS_TO|:CHILD_OF*..]-(c:Classification)
RETURN pa, po, ps, pf, pc, pt, pl`
This query however breaks my DB, I feel like I'm missing something essential for writing CQL queries...
I've also looked into COLLECT and UNWIND in this neo blog post but couldn't really make sense of it.
How can I paginate my query without removing the unions? Or is there any other way of writing the query so that pagination can be applied at the Journal level and the performance isn't affected?
--- EDIT ---
Here is the execution plan for my second query
You really don't need UNION for this, because when you approach this using UNION, you're getting all the related nodes for every :Journal node, and only AFTER you've made all those expansions from every :Journal node do you limit your result set. That is a ton of work that will only be excluded due to your LIMIT.
Your second query looks like the more correct approach, matching on :Journal nodes with a LIMIT, and only then matching on the related nodes to prepare the data for return.
You said that the second query breaks your DB. Can you run a PROFILE on the query (or an EXPLAIN, if the query never finishes execution), expand all elements of the plan, and add it to your description?
Also, if you leave out the final MATCH to :Classification, does the query behave correctly?
It would also help to know if you really need the paths returned, or if it's enough to just return the connected nodes.
EDIT
If you want each :Journal and all its connected data on a single row, you need to either be using COLLECT() after each match, or using pattern comprehension so the result is already in a collection.
This will also cut down on unnecessary queries. Your initial match (after the limit) generated 31k rows, so all subsequent matches executed 31k times. If you collect() or use pattern comprehension, you'll keep the cardinality down to your initial 10, and prevent redundant matches.
Something like this, if you only want collected paths returned:
MATCH (j:Journal)
WITH j LIMIT 10
WITH j,
[pa=(j)<-[:BELONGS_TO]-(a:ArticleType) | pa] as pa,
[po=(j)<-[:OWNS]-(o:JournalOwner) | po] as po,
[ps=(j)<-[:BELONGS_TO]-(s:Section) | ps] as ps,
[pf=(j)-[:ACCEPTS]->(f:FileCategory) | pf] as pf,
[pc=(j)-[:CHARGED_BY]->(apc:APC) | pc] as pc,
[pt=(j)-[:ACCEPTS]->(sft:SupportedFileType) | pt] as pt,
[pl=(j)<-[:BELONGS_TO|:CHILD_OF*..]-(c:Classification) | pl] as pl
RETURN pa, po, ps, pf, pc, pt, pl

Limit the results of a union cypher query

Let's say we have the example query from the documentation:
MATCH (n:Actor)
RETURN n.name AS name
UNION
MATCH (n:Movie)
RETURN n.title AS name
I know that if I do that:
MATCH (n:Actor)
RETURN n.name AS name
LIMIT 5
UNION
MATCH (n:Movie)
RETURN n.title AS name
LIMIT 5
I can reduce the returned results of each sub query to 5.How can I LIMIT the total results of the union query?
This is not yet possible, but there is already an open neo4j issue that requests the ability to do post-UNION processing, which includes what you are asking about. You can add a comment to that neo4j issue if you support having it resolved.
This can be done using UNION post processing by rewriting the query using the COLLECT function and the UNWIND clause.
First we turn the columns of a result into a map (struct, hash, dictionary), to retain its structure. For each partial query we use the COLLECT to aggregate these maps into a list, which also reduces our row count (cardinality) to one (1) for the following MATCH. Combining the lists is a simple list concatenation with the “+” operator.
Once we have the complete list, we use UNWIND to transform it back into rows of maps. After this, we use the WITH clause to deconstruct the maps into columns again and perform operations like sorting, pagination, filtering or any other aggregation or operation.
The rewritten query will be as below:
MATCH (n:Actor)
with collect ({name: n.title}) as row
MATCH (n:Movie)
with row + collect({name: n.title}) as rows
unwind rows as row
with row.name as name
return name LIMIT 5
This is possible in 4.0.0
CALL {
MATCH (p:Person) RETURN p
UNION
MATCH (p:Person) RETURN p
}
RETURN p.name, p.age ORDER BY p.name
Read more about Post-union processing here https://neo4j.com/docs/cypher-manual/4.0/clauses/call-subquery/

Resources