In SQL where is an option (NOT EXISTS) that allows us to only select a row if there is zero results from other SELECT.
For example:
SELECT
c.CustomerKey
FROM
Customer c, Sales s1, Date d
WHERE
s1.CustomerKey = c.CustomerKey AND
s1.OrderDateKey = d.DateKey AND
s1.ShipDate > s1.DueDate AND
NOT EXISTS (SELECT *
FROM Sales s2
WHERE s2.OrderDateKey = s1.OrderDateKey AND
s2.CustomerKey <> s1.CustomerKey)
GROUP BY
c.CustomerKey
I tried to do the following but the query never ends so I assume I'm doing it the wrong way. What am I missing?
MATCH (d1:Date)<-[:ORDERDATE]-(s1:Sales)-[:CUSTOMER]->(c1:Customer)
WHERE s1.ShipDate > s1.DueDate
WITH d1,s1,c1
MATCH (s2:Sales)-[:CUSTOMER]->(c2:Customer)
WHERE NOT(s2.OrderDateKey=s1.OrderDateKey AND c2.CustomerKey<>c1.CustomerKey)
RETURN c2.CustomerKey
The query below should do what you want.
First, you should create an index on :Sales(OrderDateKey) so that the OPTIONAL MATCH in the query below can quickly find the desired Sales nodes (instead of scanning all of them):
CREATE INDEX ON :Sales(OrderDateKey);
When the OPTIONAL MATCH clause fails to find a match, it sets its unbound identifiers to NULL. The following query takes advantage of that fact:
MATCH (:Date)<-[:ORDERDATE]-(s1:Sales)-[:CUSTOMER]->(c1:Customer)
WHERE s1.ShipDate > s1.DueDate
WITH s1.OrderDateKey AS odk, c1.CustomerKey AS customerKey
OPTIONAL MATCH (s2:Sales)-[:CUSTOMER]->(c2:Customer)
WHERE s2.OrderDateKey=odk AND c2.CustomerKey<>customerKey
WITH customerKey
WHERE c2 IS NULL
RETURN DISTINCT customerKey;
The tricky part of translating SQL to Cypher is figuring out when we should still do joins and predicates based on keys, vs when we should be translating those operations into usages of nodes and relationships.
Let's first translate what the SQL means, as best as I can tell:
We want to match a Sale with a Customer and an order Date, where the
sale's ship date is past the due date, and there isn't already a Sale
with the same order Date for a different Customer.
It looks like Sale.OrderDateKey is a foreign key to Date.DateKey's primary key, and that Sales.CustomerKey is a foreign key to Customer.CustomerKey's primary key.
If the above assumption is true, then we don't need to work with these keys at all...where SQL uses foreign and primary keys for joining, Neo4j uses relationships between nodes instead, so we don't need to actually use these fields for anything in this query except the returned values.
MATCH (orderDate:Date)<-[:ORDERDATE]-(s1:Sales)-[:CUSTOMER]->(c1:Customer)
WHERE s1.ShipDate > s1.DueDate
WITH orderDate, c1
// match to exclude is a sale with the same orderDate but different customer
OPTIONAL MATCH (orderDate)<-[:ORDERDATE]-(:Sales)-[:CUSTOMER]->(c2:Customer)
WHERE c1 <> c2
WITH c1
WHERE c2 IS NULL
RETURN DISTINCT c1.customerKey;
Related
match(m:master_node:Application)-[r]-(k:master_node:Server)-[r1]-(n:master_node)
where (m.name contains '' and (n:master_node:DeploymentUnit or n:master_node:Schema))
return distinct m.name,n.name
Hi,I am trying to get total number of records for the above query.How I change the query using count function to get the record count directly.
Thanks in advance
The following query uses the aggregating funtion COUNT. Distinct pairs of m.name, n.name values are used as the "grouping keys".
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
RETURN m.name, n.name, COUNT(*) AS cnt
I assume that m.name contains '' in your query was an attempt to test for the existence of m.name. This query uses the EXISTS() function to test that more efficiently.
[UPDATE]
To determine the number of distinct n and m pairs in the DB (instead of the number of times each pair appears in the DB):
MATCH (m:master_node:Application)--(:master_node:Server)--(n:master_node)
WHERE EXISTS(m.name) AND (n:DeploymentUnit OR n:Schema)
WITH DISTINCT m.name AS n1, n.name AS n2
RETURN COUNT(*) AS cnt
Some things to consider for speeding up the query even further:
Remove unnecessary label tests from the MATCH pattern. For example, can we omit the master_node label test from any nodes? In fact, can we omit all label testing for any nodes without affecting the validity of the result? (You will likely need a label on at least one node, though, to avoid scanning all nodes when kicking off the query.)
Can you add a direction to each relationship (to avoid having to traverse relationships in both directions)?
Specify the relationship types in the MATCH pattern. This will filter out unwanted paths earlier. Once you do so, you may also be able to remove some node labels from the pattern as long as you can still get the same result.
Use the PROFILE clause to evaluate the number of DB hits needed by different Cypher queries.
You can find examples of how to use count in the Neo4j docs here
In your case the first example where:
count(*)
Is used to return a count of each returned item should work.
I have a neo4j database with several nodes each having many properties. I am trying to find a list of unique values available for each property.
Currently, I can search for nodes that have a certain value 'xxx' with a query in this manner below however, I want to find all the unique values 'xxx','yyy', etc. that may exist in all the nodes in my database.
match (n:Db_Nodes) where n.stringProperty = "xxx" return n
How should I go about structuring the query desired?
You can use the DISTINCT Clause to return all the unique values for this property.
There are two ways to get all the values:
Get all the values in a list. Here result will be one single record with all the unique values in the form of a list.
MATCH (n:Db_Nodes) RETURN COLLECT(DISTINCT n.stringProperty) as propertyList
Get one value per record. Here multiple records will be returned(One per unique property value).
MATCH (n:Db_Nodes) RETURN DISTINCT n.stringProperty
I have a bunch of venues in my Neo4J DB. Each venue object has the property 'catIds' that is an array and contains the Ids for the type of venue it is. I want to query the database so that I get all Venues but they are ordered where their catIds match or contain some off a list of Ids that I give the query. I hope that makes sense :)
Please, could someone point me in the direction of how to write this query?
Since you're working in a graph database you could think about modeling your data in the graph, not in a property where it's hard to get at it. For example, in this case you might create a bunch of (v:venue) nodes and a bunch of (t:type) nodes, then link them by an [:is] relation. Each venue is linked to one or more type nodes. Each type node has an 'id' property: {id:'t1'}, {id:'t2'}, etc.
Then you could do a query like this:
match (v:venue)-[r:is]->(:type) return v, count(r) as n order by n desc;
This finds all your venues, along with ALL their type relations and returns them ordered by how many type-relations they have.
If you only want to get nodes of particular venue types on your list:
match (v:venue)-[r:is]-(t:type) where t.id in ['t1','t2'] return v, count(r) as n order by n desc;
And if you want ALL venues but rank ordered according to how well they fit your list, as I think you were looking for:
match (v:venue) optional match (v)-[r:is]->(t:type) where t.id in ['t1','t2'] return v, count(r) as n order by n desc;
The match will get all your venues; the optional match will find relations on your list if the node has any. If a node has no links on your list, the optional match will fail and return null for count(r) and should sort to the bottom.
I have this query in SQL:
Select Id, CrawlerId,CrawlerName,
(SELECT Count(*) from CrawlerResult cr where cr.CrawlerId = cs.CrawlerId and IsNew=1) as LastRunResult ,
(SELECT Count(*) from CrawlerResult cr where cr.CrawlerId = cs.CrawlerId ) as TotalResult
FROM CrawlerScheduler cs
How to convert this query to neo4j cypher by combining CrawlerScheduler and CrawlerResult nodes?
I'm assuming you've replaced the foreign key relationships from SQL with actual relationships in Cypher, and that you're using actual booleans instead of 1 and 0? Something like:
(:CrawlerScheduler)-[:RESULT]->(:CrawlerResult)
If so then the equivalent Cypher query might look like this:
MATCH (cs:CrawlerScheduler)
WITH cs, SIZE((cs)-[:RESULT]->()) as TotalResult
OPTIONAL MATCH (cs)-[:RESULT]->(cr)
WHERE cr.IsNew
WITH cs, TotalResult, COUNT(cr) as LastRunResult
RETURN cs.Id, cs.CrawlerId, cs.CrawlerName, LastRunResult, TotalResult
EDIT
I changed the second match to an OPTIONAL MATCH, just in case the scheduler didn't have results, or didn't have new results.
I have city nodes connected to each other by HAS_BUS relationships.
eg.
CREATE (:City{id:123,name:'Mumbai'})-[:HAS_BUS{id:22323,arr_time:234,dept_time:250}]->(:City{id:124,name:'Goa'}).
Though I wanted a bus id to be unique I mistakenly put bus with same id more than once i.e there are non unique HAS_BUS relationships.
Q1.How should I find out which ids are not unique
Q2.How should I find out which ids are not unique and delete them.
I wrote this query but got an unknown error
MATCH ()-[r:HAS_BUS]->() with count(r.id) as t match ()-[s:HAS_BUS]->() where s.id=t with count(s.id) as times,s.id as id where count(times)>1 return id,times
The database contains only 80 nodes and 6500 relationships.
I am actually missing GROUP BY feature of mySQL
DATABASE Can be downloaded from here 6MB
I am assuming that you want the bus id to be unique within your entire graph i.e. bus ID 22323 corresponds to exactly one HAS_BUS relation in the entire graph and is not repeated with different cities.
MATCH (c:City)-[r:HAS_BUS]->()
WITH r.id as busId, count(*) as idCount
WHERE idCount>1
RETURN busId,idCount
note: not tested
will give you all busIds repeated more than once.
Then you can figure out which places you want to delete the duplicates from, or delete all and re-create the correct one.
The group by you're looking for is documented at http://docs.neo4j.org/chunked/milestone/query-aggregation.html
Edit to remove all but one duplicate bus id: Make sure you have a backup of your database- this is NOT tested
MATCH (c:City)-[r:HAS_BUS]->()
WITH r.id as busId, count(*) as idCount
WHERE idCount>1 //Find all duplicate ids
MATCH (c:City)-[r2:HAS_BUS]->()
WHERE r2.id=busId
with COLLECT(r2) as ids,busId //find all relations for those duplicates
with head(ids) as firstId,ids,busId
with filter(x in ids where x<>firstId) as idsToDelete //keep the first id and collect the rest
foreach (d in idsToDelete | DELETE d); //delete the rest
The code to remove all duplicates but one may be a bit simplified using tail instead of head-filter:
MATCH (c:City)-[r:HAS_BUS]->()
WITH r.id as busId, count(*) as idCount
WHERE idCount>1 //Find all duplicate ids
MATCH (c:City)-[r2:HAS_BUS]->()
WHERE r2.id=busId
WITH tail(collect(r2)) as to_delete
FOREACH (d IN to_delete | DELETE d)