I have city nodes connected to each other by HAS_BUS relationships.
eg.
CREATE (:City{id:123,name:'Mumbai'})-[:HAS_BUS{id:22323,arr_time:234,dept_time:250}]->(:City{id:124,name:'Goa'}).
Though I wanted a bus id to be unique I mistakenly put bus with same id more than once i.e there are non unique HAS_BUS relationships.
Q1.How should I find out which ids are not unique
Q2.How should I find out which ids are not unique and delete them.
I wrote this query but got an unknown error
MATCH ()-[r:HAS_BUS]->() with count(r.id) as t match ()-[s:HAS_BUS]->() where s.id=t with count(s.id) as times,s.id as id where count(times)>1 return id,times
The database contains only 80 nodes and 6500 relationships.
I am actually missing GROUP BY feature of mySQL
DATABASE Can be downloaded from here 6MB
I am assuming that you want the bus id to be unique within your entire graph i.e. bus ID 22323 corresponds to exactly one HAS_BUS relation in the entire graph and is not repeated with different cities.
MATCH (c:City)-[r:HAS_BUS]->()
WITH r.id as busId, count(*) as idCount
WHERE idCount>1
RETURN busId,idCount
note: not tested
will give you all busIds repeated more than once.
Then you can figure out which places you want to delete the duplicates from, or delete all and re-create the correct one.
The group by you're looking for is documented at http://docs.neo4j.org/chunked/milestone/query-aggregation.html
Edit to remove all but one duplicate bus id: Make sure you have a backup of your database- this is NOT tested
MATCH (c:City)-[r:HAS_BUS]->()
WITH r.id as busId, count(*) as idCount
WHERE idCount>1 //Find all duplicate ids
MATCH (c:City)-[r2:HAS_BUS]->()
WHERE r2.id=busId
with COLLECT(r2) as ids,busId //find all relations for those duplicates
with head(ids) as firstId,ids,busId
with filter(x in ids where x<>firstId) as idsToDelete //keep the first id and collect the rest
foreach (d in idsToDelete | DELETE d); //delete the rest
The code to remove all duplicates but one may be a bit simplified using tail instead of head-filter:
MATCH (c:City)-[r:HAS_BUS]->()
WITH r.id as busId, count(*) as idCount
WHERE idCount>1 //Find all duplicate ids
MATCH (c:City)-[r2:HAS_BUS]->()
WHERE r2.id=busId
WITH tail(collect(r2)) as to_delete
FOREACH (d IN to_delete | DELETE d)
Related
I am trying to return a set of a node from 2 sessions with a condition that returned node should not be present in another session (third session). I am using the following code but it is not working as intended.
MATCH (:Session {session_id: 'abc3'})-[:HAS_PRODUCT]->(p:Product)
UNWIND ['abc1', 'abc2'] as session_id
MATCH (target:Session {session_id: session_id})-[r:HAS_PRODUCT]->(product:Product)
where p<>product
WITH distinct product.products_id as products_id, r
RETURN products_id, count(r) as score
ORDER BY score desc
This query was supposed to return all nodes present in abc1 & abc2 but not in abc3. This query is not excluding all products present in abc3. Is there any way I can get it working?
UPDATE 1:
I tried to simplify it without UNWIND as this
match (:Session {session_id: 'abc3'})-[:HAS_PRODUCT]->(p:Product)
MATCH (target:Session {session_id: 'abc1'})-[r:HAS_PRODUCT]->(product:Product)
where product <> p
WITH distinct product.products_id as products_id
RETURN products_id
Even this is also not working. It is returning all items present in abc1 without removing those which are already in abc3. Seems like where product <> p is not working correctly.
I would suggest it would be best to check if the nodes are in a list, and to prove out the approach, start with a very simple example.
Here is a simple cypher showing one way to do it. This approach can then be extended into the complex query,
// get first two product IDs as a list
MATCH (p:Product)
WITH p LIMIT 2
WITH COLLECT(ID(p)) as list
RETURN list
// now show two more product IDs which not in that list
MATCH (p:Product)
WITH p LIMIT 2
WITH COLLECT(ID(p)) as list
MATCH (p2:Product)
WHERE NOT ID(p2) in list
RETURN ID(p2) LIMIT 2
Note: I'm using the ID() of the nodes instead of the entire node, same dbhits but may be more performant...
Given that I'm very new to Neo4j. I have a schema which looks like the below image:
Here Has nodes are different for example Passport, Merchant, Driving License, etc. and also these nodes are describing the customer node (looking for future scope of filtering customers based on these nodes).
SIMILAR is a self-relation meaning there exists a customer with ID:1 is related to another customer with ID:2 with a score of 2800.
I have the following questions:
Is this a good schema given the condition of the future scope I mentioned above, or getting all the properties in a single customer node is viable? (Different nodes may have array of items as well, for example: ()-[:HAS]->(Phone) having {active: "+91-1231241", historic_phone_numbers: ["+91-121213", "+91-1231421"]})
I want to get the customer along with describing nodes in relation to other customers. For that, I tried the below query (w/o number of relation more than 1):
// With number_of_relation > 1
MATCH (searched:Customer)-[r:SIMILAR]->(matched:Customer)
WHERE r.score > 2700
WITH searched, COLLECT(matched.customer_id) AS MatchedList, count(r) as cnt
WHERE cnt > 1
UNWIND MatchedList AS matchedCustomer
MATCH (person:Customer {customer_id: matchedCustomer})-[:HAS|:LIVES_IN|:IS_EMPLOYED_BY]->(related)
RETURN searched, person, related
Result what I got is below, notice one customer node not having its describing nodes:
// without number_of_relation > 1
// second attempt - for a sample customer_id
MATCH (matched)<-[r:SIMILAR]-(c)-[:HAS|:LIVES_IN|:IS_EMPLOYED_BY]->(b)
WHERE size(keys(b)) > 0
AND c.customer_id = "1b093559-a39b-4f95-889b-a215cac698dc"
AND r.score > 2700
RETURN b AS props, c AS src_cust, r AS relation, matched
Result I got are below, notice related nodes are not having their describing nodes:
If I had two describing nodes with some property (some may have a list) upon which I wanted to query and build the expected graph specified in point 2 above, how can I do that?
I want the database to find a similar customer given the describing nodes. Example: A customer {name: "Dave"} has phone {active_number: "+91-12345"} is similar to customer {name: "Mike"} has phone {active_number: "+91-12345"}. How can get started with this?
If something is unclear, please ask. I can explain with examples.
[EDITED]
Yes, the schema seems fine, except that you should not use the same HAS relationship type between different node label pairs.
The main problem with your first query is that its top MATCH clause uses a directional relationship pattern, ()-->(), which does not allow all Customer nodes to have a chance to be the searched node (because some nodes may only be at the tail end of SIMILAR relationships). This tweaked query should work better:
MATCH (searched:Customer)-[r:SIMILAR]-(matched:Customer)
WHERE r.score > 2700
WITH searched, COLLECT(matched) AS matchedList
WHERE SIZE(matchedList) > 1
UNWIND matchedList AS person
MATCH (person)-[:HAS|LIVES_IN|IS_EMPLOYED_BY]->(pDesc)
WITH searched, person, COLLECT(pDesc) AS personDescribers
MATCH (searched)-[:HAS|LIVES_IN|IS_EMPLOYED_BY]->(sDesc)
RETURN searched, person, personDescribers, COLLECT(sDesc) AS searchedDescribers
It's not clear what you want are trying to do.
To get all Customers who have the same phone number:
MATCH (c:Customer)-[:HAS_PHONE]-(p:Phone)
WHERE p.activeNumber = '+91-12345'
WITH p.activeNumber AS phoneNumber, COLLECT(c) AS customers
WHERE SIZE(customers) > 1
RETURN phoneNumber, customers
I'm wondering, when I have read the data of a node and I want to match it in another query, which way will have the best performance? Using id like this:
MATCH (n) WHERE ID(n) = 1234
or using indices of the node:
MATCH (n:Label {SomeIndexProperty: 3456})
Which one is better?
IDs are a technical ID for Neo4j, and those should not be used as a primary key for your application.
Every node (and relationship) has a technical ID, and it's stable over time.
But if you delete a node, for example the node 32, Neo4j will reuse this ID for a new node.
So you can use it in your queries inside the same transaction (there is no problem), otherwise you should know what you are doing.
The only way to retrieve the technical ID, is to use the function ID like you do on your first query : MATCH (n) WHERE ID(n) = 1234 RETURN n.
The ID is not exposed as a node's property, so you can't do MATCH (n {ID:1234}) RETURN n.
You have noticed that if you want to do a WHERE on a strict equality, you can do put the condition directly on the node.
For example :
MATCH (n:Node) WHERE n.name = 'logisima' RETURN n
MATCH (n:Node {name:'logisima'}) RETURN n
Those two queries are identicals, they generate the same query plan, it's just a syntactic sugar.
Is it faster to retrieve a node by its ID or by an indexed property ?
The easier way to know the answer to this question is to profile the two queries.
On the one based on the ID, you will see the box NodeByIdSeek that cost 1 db hit, and on the one with a unique constrainst you will see the box NodeUniqueIndexSeek with 2 db hits.
So searching a node by its ID is faster.
I have 2 Account Nodes, and several listing nodes, as seen below. The Match statement results in 2 Accounts being show with a relationship to each Listing thats associated to that account.
What im wanting to do is create a relationship between the two Accounts based on at least 1 of there listings each sharing the same phone number.
If possible im wanting to see the relationship between the two account nodes drawn and a relationship between the two listings so long as the listings are from different Account.
MERGE (:Account {account_id:11})
MERGE (:Listing {account_id:11, listing_id:1001, phone:99468320, author:'Paul'})
MERGE (:Account {account_id:12})
MERGE (:Listing {account_id:12, listing_id:1002, phone:97412521, author:'Sam'})
MERGE (:Listing {account_id:12, listing_id:1003, phone:97412521, author:'Sam'})
MERGE (:Listing {account_id:12, listing_id:1004, phone:99468320, author:'Sam'})
MERGE (:Listing {account_id:12, listing_id:1004, phone:0, author:'Same'})
MATCH (a:Account),(l:Listing)
WHERE a.account_id = l.account_id
CREATE (a)-[:LISTING]->(l)
RETURN a,l;
For the latter i did try the following but it went a bit crazy as it linked every listing to each other that had the same number appose to only doing so if the account_id was different.
match (p1:Listing)
with p1
match (p2:Listing)
where p2.phone = p1.phone and p1 <> p2
merge(p1)-[r:SHARED_PHONE]-(p2)
RETURN p1, p2
First of all, you should carefully consider if you really need the SHARED_PHONE relationships, as you will have to update the relationships every time a phone number is added, removed, or changed. That could complicate a lot of your queries and make your DB unnecessarily slower. Also, you could end up with a lot of SHARED_PHONE relationships (that you may not really need). Instead of creating the relationships, you could consider incorporating into the relevant queries the discovery of the nodes with the same phone numbers.
However, if you decide that you really need that relationship, here is one way to do what you want:
[UPDATED]
MATCH (n: Listing)
WITH n.phone AS phone, COLLECT(n) AS ns
FOREACH(i IN RANGE(0, SIZE(ns)-2) |
FOREACH(x IN [ns[i]] |
FOREACH(y IN [z IN ns[i+1..] WHERE x.account_id <> z.account_id] |
MERGE (x)-[:SHARED_PHONE]-(y)
)))
The WITH clause collects all the (unique) Listing nodes that share the same phone, and the nested FOREACH clauses execute the minimum number of MERGEs needed to ensure that all the appropriate nodes are connected by a SHARED_PHONE relationship (in either direction). The innermost FOREACH also ensures that the nodes to be connected do not have the same account_id.
The following query getting all groups of specific user then
unwind on each result(each group) and should remove all incoming relations only if count relations to that group is 1.
example: group1<-user1 (will delete the incoming relationship to the group)
group1-<user1
group1-<user2 (will remain all incoming relationships to the group)
can assist complete it?
MATCH (me:userId{{1})-[rel:relation_group]-(allGroups:GROUP)
unwind userGroups as group
//how to use CASE or WHERE in order to check if this group
has only 1 relationship just remove it
Thanks.
You can use size in WHERE, example :
MATCH (me:userId{{1})-[rel:relation_group]-(allGroups:GROUP)
WHERE size((allGroups)<-[:relation_group]-()) = 1
DELETE rel
You don't need to iterate, by default the subsequent clauses after a MATCH will be executed for each row found in the MATCH, so let's say the first MATCH returns the following :
me rel allGroups
1 rel3 node5
1 rel4 node6
Then the DELETE will be executed for the first row, then for the second row, etc...