Querying multiple indexes not working if one condition fails in Neo4j - neo4j

I am trying to search for a key word on all the indexes. I have in my graph database.
Below is the query:
start n=node:Users(Name="Hello"),
m=node:Location(LocationName="Hello")
return n,m
I am getting the nodes and if keyword "Hello" is present in both the indexes (Users and Location), and I do not get any results if keyword Hello is not present in any one of index.
Could you please let me know how to modify this cypher query so that I get results if "Hello" is present in any of the index keys (Name or LocationName).

In 2.0 you can use UNION and have two separate queries like so:
start n=node:Users(Name="Hello")
return n
UNION
start n=node:Location(LocationName="Hello")
return n;
The problem with the way you have the query written is the way it calculates a cartesian product of pairs between n and m, so if n or m aren't found, no results are found. If one n is found, and two ms are found, then you get 2 results (with a repeating n). Similar to how the FROM clause works in SQL. If you have an empty table called empty, and you do select * from x, empty; then you'll get 0 results, unless you do an outer join of some sort.
Unfortunately, it's somewhat difficult to do this in 1.9. I've tried many iterations of things like WITH collect(n) as n, etc., but it boils down to the cartesian product thing at some point, no matter what.

Related

Neo4j count Query

match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.

(Neo4j) Query does not display a graph

I have a query that I am trying to execute. The query works, but there isn't an option to see this data in graph format. Instead the data is returned in table/text format.
When I simplify the query, the output is displayed in graph format - No idea why,
This is the query that is giving me the issue:
MATCH (p:Person)-[hi:hasIdentity]->(i:Identity)
MATCH (j:Person)-[hi2:hasIdentity]->(i2:Identity)
MATCH (i)-[bl:Linked]->(i2)
WHERE NOT p=j
return DISTINCT(p.id), COUNT(DISTINCT(j))
LIMIT 5
Does anyone have any idea why that might be the case?
You'll need to return variables associated with nodes and/or relationships for it to display as a graph. As it is now you're returning properties of nodes (p.id), probably integers or strings. Try this return instead:
...
RETURN p, COUNT(DISTINCT j)
LIMIT 5
By the way, DISTINCT isn't a function, no need for parenthesis, and when you have a RETURN or WITH that has an aggregation, you don't need to use DISTINCT for that line since the non-aggregation variables become distinct since they act as the grouping key for the aggregation.

Cypher - Neo4j Query Profiling

I have some questions regarding Neo4j's Query profiling.
Consider below simple Cypher query:
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
and output is:
So according to Neo4j's Documentation:
3.7.2.2. Expand Into
When both the start and end node have already been found, expand-into
is used to find all connecting relationships between the two nodes.
Query.
MATCH (p:Person { name: 'me' })-[:FRIENDS_WITH]->(fof)-->(p) RETURN
> fof
So here in the above query (in my case), first of all, it should find both the StartNode & the EndNode before finding any relationships. But unfortunately, it's just finding the StartNode, and then going to expand all connected :HAS_CONTACT relationships, which results in not using "Expand Into" operator. Why does this work this way? There is only one :HAS_CONTACT relationship between the two nodes. There is a Unique Index constraint on :Consumer{mobileNumber}. Why does the above query expand all 7 relationships?
Another question is about the Filter operator: why does it requires 12 db hits although all nodes/ relationships are already retrieved? Why does this operation require 12 db calls for just 6 rows?
Edited
This is the complete Graph I am querying:
Also I have tested different versions of same above query, but the same Query Profile result is returned:
1
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
MATCH (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
2
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"}), (m:Consumer{mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
3
PROFILE
MATCH (n:Consumer{mobileNumber: "yyyyyyyyy"})
WITH n
MATCH (n)-[r:HAS_CONTACT]->(m:Consumer{mobileNumber: "xxxxxxxxxxx"})
RETURN n,m,r;
The query you are executing and the example provided in the Neo4j documentation for Expand Into are not the same. The example query starts and ends at the same node.
If you want the planner to find both nodes first and see if there is a relationship then you could use shortestPath with a length of 1 to minimize the DB hits.
PROFILE
MATCH (n:Consumer {mobileNumber: "yyyyyyyyy"}),
(m:Consumer {mobileNumber: "xxxxxxxxxxx"})
WITH n,m
MATCH Path=shortestPath((n)-[r:HAS_CONTACT*1]->(m))
RETURN n,m,r;
Why does this do this?
It appears that this behaviour relates to how the query planner performs a database search in response to your cypher query. Cypher provides an interface to search and perform operations in the graph (alternatives include the Java API, etc.), queries are handled by the query planner and then turned into graph operations by neo4j's internals. It make sense that the query planner will find what is likely to be the most efficient way to search the graph (hence why we love neo), and so just because a cypher query is written one way, it won't necessarily search the graph in the way we imagine it will in our head.
The documentation on this seemed a little sparse (or, rather I couldn't find it properly), any links or further explanations would be much appreciated.
Examining your query, I think you're trying to say this:
"Find two nodes each with a :Consumer label, n and m, with contact numbers x and y respectively, using the mobileNumber index. If you find them, try and find a -[:HAS_CONTACT]-> relationship from n to m. If you find the relationship, return both nodes and the relationship, else return nothing."
Running this query in this way requires a cartesian product to be created (i.e., a little table of all combinations of n and m - in this case only one row - but for other queries potentially many more), and then relationships to be searched for between each of these rows.
Rather than doing that, since a MATCH clause must be met in order to continue with the query, neo knows that the two nodes n and m must be connected via the -[:HAS_CONTACT]-> relationship if the query is to return anything. Thus, the most efficient way to run the query (and avoid the cartesian product) is as below, which is what your query can be simplified to.
"Find a node n with the :Consumer label, and value x for the index mobileNumber, which is connected via a -[:HAS_CONTACT]-> relationshop to a node m with the :Consumer label, and value y for its proprerty mobileNumber. Return both nodes and the relationship, else return nothing."
So, rather than perform two index searches, a cartesian product and a set of expand into operations, neo performs only one index search, an expand all, and a filter.
You can see the result of this simplification by the query planner through the presence of AUTOSTRING parameters in your query profile.
How to Change Query to Implement Search as Desired
If you want to change the query so that it must use an expand into relationship, make the requirement for the relationship optional, or use explicitly iterative execution. Both these queries below will produce the initially expected query profiles.
Optional example:
PROFILE
MATCH (n:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
WITH n,m
OPTIONAL MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;
Iterative example:
PROFILE
MATCH (n1:Consumer{mobileNumber: "xxx"})
MATCH (m:Consumer{mobileNumber: "yyy"})
UNWIND COLLECT(n1) AS n
MATCH (n)-[r:HAS_CONTACT]->(m)
RETURN n,m,r;

Neo4J - How to do post processing UNION (pagination)

I'm writing a cypher query to load data from my Neo4J DB, this is my data model
So basically what I want is a query to return a Journal with all of its properties and everything related to it, Ive tried doing the simple query but it is not performant at all and my ec2 instance where the DB is hosted runs out of memory quickly
MATCH p=(j:Journal)-[*0..]-(n) RETURN p
I managed to write a query using UNIONS
`MATCH p=(j:Journal)<-[:BELONGS_TO]-(at:ArticleType) RETURN p
UNION
MATCH p=(j:Journal)<-[:OWNS]-(jo:JournalOwner) RETURN p
UNION
MATCH p=(j:Journal)<-[:BELONGS_TO]-(s:Section) RETURN p
UNION

MATCH p=(j:Journal)-[:ACCEPTS]->(fc:FileCategory) RETURN p
UNION
MATCH p=(j:Journal)-[:CHARGED_BY]->(a:APC) RETURN p
UNION
MATCH p=(j:Journal)-[:ACCEPTS]->(sft:SupportedFileType) RETURN p
UNION
MATCH p=(j:Journal)<-[:BELONGS_TO|:CHILD_OF*..]-(c:Classification) RETURN p
SKIP 0 LIMIT 100`
The query works fine and its performance is not bad at all, the only problem I'm finding is in the limit, I've been googling around and I've seen that post-processing queries with UNIONS is not yet supported.
The referenced github issue is not yet resolved, so post processing of UNION is not yet possible github link
Logically the first thing I tried when I came across this issue was to put the pagination on each individual query, but this had some weird behaviour that didn't make much sense to myself.
So I tried to write the query without using UNIONS, I came up with this
`MATCH (j:Journal)
WITH j LIMIT 10
MATCH pa=(j)<-[:BELONGS_TO]-(a:ArticleType)
MATCH po=(j)<-[:OWNS]-(o:JournalOwner)
MATCH ps=(j)<-[:BELONGS_TO]-(s:Section)
MATCH pf=(j)-[:ACCEPTS]->(f:FileCategory)
MATCH pc=(j)-[:CHARGED_BY]->(apc:APC)
MATCH pt=(j)-[:ACCEPTS]->(sft:SupportedFileType)
MATCH pl=(j)<-[:BELONGS_TO|:CHILD_OF*..]-(c:Classification)
RETURN pa, po, ps, pf, pc, pt, pl`
This query however breaks my DB, I feel like I'm missing something essential for writing CQL queries...
I've also looked into COLLECT and UNWIND in this neo blog post but couldn't really make sense of it.
How can I paginate my query without removing the unions? Or is there any other way of writing the query so that pagination can be applied at the Journal level and the performance isn't affected?
--- EDIT ---
Here is the execution plan for my second query
You really don't need UNION for this, because when you approach this using UNION, you're getting all the related nodes for every :Journal node, and only AFTER you've made all those expansions from every :Journal node do you limit your result set. That is a ton of work that will only be excluded due to your LIMIT.
Your second query looks like the more correct approach, matching on :Journal nodes with a LIMIT, and only then matching on the related nodes to prepare the data for return.
You said that the second query breaks your DB. Can you run a PROFILE on the query (or an EXPLAIN, if the query never finishes execution), expand all elements of the plan, and add it to your description?
Also, if you leave out the final MATCH to :Classification, does the query behave correctly?
It would also help to know if you really need the paths returned, or if it's enough to just return the connected nodes.
EDIT
If you want each :Journal and all its connected data on a single row, you need to either be using COLLECT() after each match, or using pattern comprehension so the result is already in a collection.
This will also cut down on unnecessary queries. Your initial match (after the limit) generated 31k rows, so all subsequent matches executed 31k times. If you collect() or use pattern comprehension, you'll keep the cardinality down to your initial 10, and prevent redundant matches.
Something like this, if you only want collected paths returned:
MATCH (j:Journal)
WITH j LIMIT 10
WITH j,
[pa=(j)<-[:BELONGS_TO]-(a:ArticleType) | pa] as pa,
[po=(j)<-[:OWNS]-(o:JournalOwner) | po] as po,
[ps=(j)<-[:BELONGS_TO]-(s:Section) | ps] as ps,
[pf=(j)-[:ACCEPTS]->(f:FileCategory) | pf] as pf,
[pc=(j)-[:CHARGED_BY]->(apc:APC) | pc] as pc,
[pt=(j)-[:ACCEPTS]->(sft:SupportedFileType) | pt] as pt,
[pl=(j)<-[:BELONGS_TO|:CHILD_OF*..]-(c:Classification) | pl] as pl
RETURN pa, po, ps, pf, pc, pt, pl

Perform MATCH on collection / Break apart collection

I am trying to mimc the functionality of the neo4j browser to display my graph in my front end. The neo4j browser issues two calls for every query - the first call performs the query that the user types into the query box and the second call uses find the relationships between every node returned in the first user-entered query.
{
"statements":[{
"statement":"START a = node(1,2,3,4), b = node(1,2,3,4)
MATCH a -[r]-> b RETURN r;",
"resultDataContents":["row","graph"],
"includeStats":true}]
}
In my application I would like to be more efficient so I would like to be able to get all of my nodes and relationships in a single query. The query that I have at present is:
START person = node({personId})
MATCH person-[:RELATIONSHIP*]-(p:Person)
WITH distinct p
MATCH p-[r]-(d:Data), p-[:DETAILS]->(details), d-[:FACT]->(facts)
RETURN p, r, d, details, facts
This query runs well but it doesn't give me the "d" and "details" nodes which were linked to the original "person".
I have tried to join the "p" and "person" results in a collection:
collect(p) + collect(person) AS people
But this does not allow me to perform a MATCH on the resulting collection. As far as I can figure out there is no way of breaking apart a collection.
The only option I see at the moment is to split the query into two; return the "collect(p) + collect(person) AS people" collection and then use the node values in a second query. Is there a more efficient way of performing this query?
If you use the quantifier *0.. RELATIONSHIP is also match at a depth of 0 making person the same as p in this case. The * without specified limits defaults to 1..infinity
START person = node({personId})
MATCH person-[:RELATIONSHIP*0..]-(p:Person)
WITH distinct p
MATCH p-[r]-(d:Data), p-[:DETAILS]->(details), d-[:FACT]->(facts)
RETURN p, r, d, details, facts

Resources