Neo4j Cypher exclude nodes where a specific relationship is missing - neo4j

I am trying to implement a fraud detection system in neo4j, where I have a bunch of nodes with person, bank account, kredit card, telephone numbers and addresses.
A basic idea of detecting fraud in bank sytems is someone who has a bank account and a credit card, where his credit card is not linked to his own bank account.
And I cannot figure it out what to do. Because when I try to exclude these nodes with:
WHERE NOT (k)-[:VERKNUEPFT]-(b)
I still get the wrong nodes, but it just hides the VERKNUEPFT node.
Can someone give me the correct way to negate, to exclude every not needed node?
So simply said I need to get following output:
First I filtered out which nodes are needed at all:
MATCH (p:person)-[r:HAT_KONTO]->(b:bankkonto), (p)-[r2:NUTZT_KARTE]->(k:kreditkarte) return p,b,k,r,r2;
which gives me the following:
the nodes below this Hermine and Ron are correct, so I want to exclude everything who are linked to them.
But when I try to do MATCH (p:person)-[r:HAT_KONTO]->(b:bankkonto), (p)-[r2:NUTZT_KARTE]->(k:kreditkarte) WHERE NOT (k)-[:VERKNUEPFT]-(b) return p,b,k,r,r2;
I get the following:
only the bankaccount (the brown one) is missing.
When I test the same code with WHERE instead of WHERE NOT:
MATCH (p:person)-[r:HAT_KONTO]->(b:bankkonto), (p)-[r2:NUTZT_KARTE]->(k:kreditkarte) WHERE (k)-[:VERKNUEPFT]-(b) return p,b,k,r,r2;
I achieve the opposite of what I want to.

I think you need to check, whether all the credit cards held by a person, are linked to any one of his bank accounts. Currently, you are checking if they are linked to a specific bank. Try something along these lines:
MATCH (p:person)-[:HAT_KONTO]->(b:bankkonto)
WITH p, collect(b) AS banks
MATCH (p)-[r2:NUTZT_KARTE]->(k:kreditkarte)
WITH p, banks, collect(k) AS creditCards
WHERE ALL(card IN creditCards WHERE ANY(bank IN banks WHERE (card)-[:VERKNUEPFT]-(bank)))
UNWIND banks AS b
UNWIND creditCards AS k
MATCH (p)-[r:HAT_KONTO]->(b), (p)-[r2:NUTZT_KARTE]->(k)
RETURN p,r,b,r2,k
In the above query, we first collect the banks associated with a person and his/her credit cards into two different collections. Then, we check whether all the credit cards are linked to one of the banks, to filter out the valid users. Then we return their details.

Related

Neo4J: How can I repeat my query on all resulting nodes?

so I've started working on a project for my Bachelor Thesis and therefore I'm looking for some help with Cypher since I've hadn't had any touchpoints with it yet!
I've got the BTC Blockchain as my DB and now I want to use the Multi Input Clustering Heuristic to identify all addresses that belong to a person. This means that I want to identify all Transactions from a BTC Wallet that have more than one input address. Once I have the transactions identified, I am looking for the wallets. This is what the following query does:
MATCH(a:Address{address:"3QQdfAaPhP1YqLYMBS59BqWjcpXjXVP1wi"})-[:SENDS]-(tx)-[:SENDS]-(a2)
RETURN a2
Now that I have these wallets I want to repeat the exact process on these wallets as well and their resulting wallets and so on! So I need a recursive Query which returns me all wallets that where used as Input Wallets at some point.
Note:
Example of an Address that has 2 transactions
This how a the graph of an BTC wallet looks like that received and sent BTC. There are 3 types of nodes (Transactions, Addresses, Blocks).
I'm not sure I understand fully, but would the following work to return more than 1 input address?
MATCH (a:Address)-[:SENDS*2..]->(tx:Transaction)
WITH DISTINCT a
// Do something with these address
RETURN a.address // or whatever
For the record, it's not recommended to have an infinite path like *2... You're better off putting a max value, like *2..10, which will limit the returning path to anywhere between 2 and 10 hops.

Cypher query for list pattern

I have a schema which looks like below:
A customer is linked to another customer with a relationship SIMILAR having similarity score.
Example: (c1:Customer)-->(c2:Customer)
An Email node is connected to each customer with relationship MAIL_AT with the following node properties:
{
"active_email_address": "a#mail.com",
"cibil_email_addresses": [
"b#mail.com", "c#mail.com"
]
}
Example: (e1:Email)<-[:MAIL_AT]-(c1:Customer)-[:SIMILAR]->(c2:Customer)-[:MAIL_AT]->(e2:Email)
A Risk node with some risk-related properties (below) and is related to customer with relationship HAS_RISK:
{
"f0_score": 870.0,
"pta_score": 430.0
}
A Fraud node with some fraud-related properties (below) and is related to customer with relationship IS_FRAUD:
{
"has_commited_fraud": true
}
My Objectives:
To find the customers with common email addresses (irrespective of active and secondary)?
My tentative solution:
MATCH (email:Email)
WITH email.cibil_email_addresses + email.active_email_address AS emailAddress, email
UNWIND emailAddress AS eaddr
WITH DISTINCT eaddr AS deaddr, email
UNWIND deaddr AS eaddress
MATCH (customer:Customer)-[]->(someEmail:Email)
WHERE eaddress IN someEmail.cibil_email_addresses + someEmail.active_email_address
WITH eaddress, COLLECT(customer.customer_id) AS customers
RETURN eaddress, customers
Problem: It is taking forever to execute this. Working with lists will take time I understand, however, I'm flexible to change the schema (If suggested). Should I break the email address into separate nodes? If yes, then how do I break cibil_email_addresses into different nodes as they can vary - Should I create two nodes with different cibil email addresses and connect both of them to customer with relationship HAS_CIBIL_EMAIL? (Is this a valid schema design). Also, it is possible, a customer's active_email_address is present in other customer's cibil_email_address. I'm trying to find a synthetic identity attack. PS: If there exists some APOC that can help achieve this and below, do suggest with example.
In production, for a given customer with email addresses, risk values, similarity score, and also given other customers may or may not be tagged with fraud_status, I want to check whether this new person will fall in a fraud ring or not. PS: If I need to use any gds to solve this, please suggest with examples.
If I were to do this same exercise with some other node such as Address which may be partially matching and will be having same list of historical addresses in a list, what should be my ideal approach?
I know, I'm tagging someone in my question, but that person only seems to be active with respect to Cypher on StackOverflow. #cybersam any help?
Thanks.
This should work:
MATCH (e:Email)
UNWIND (e.cibil_email_addresses + e.active_email_address) AS address
WITH address, COLLECT(e) AS es
UNWIND es AS email
MATCH (email)<-[:MAIL_AT]-(cust)
RETURN address, COLLECT(cust) AS customers
The WITH clause takes advantage of the arregating function COLLECT to automatically collect all the Email nodes containing the same address, using address as the grouping key.
You should only ask one question at a time. You have a couple of other questions at the bottom. If you continue to need help with them, please create new questions.

Neo4J query to find same data link to different nodes

Following is what I created in Neo4j:
Nodes: Customer Names, Customer Address and Customer Contact
Linked these nodes based on common relationships between all three.
I can see all three nodes linked in Neo4j. Contact contain email and phone numbers so some cases customer name node is connected to email address, phone number and address.
In my learning curve I am asked to show how many same contacts are used by different customer names also how many same address used by different customer names. Based on my little experience I tried few queries but couldnt reach to results.
Tried following query -
start n=node(*)
match n-[:CONTACT_AT]-()
return distinct n
CONTACT_AT is the relationship between customer name and Contact (email, phone) node.
Your question does not provide enough information about your data model. To save time, I will assume that it looks something like this (without showing all the properties):
(a:Address)<-[:ADDRESS_AT]-(p:Person {name: '...'})-[:CONTACT_AT]->(c:Contact)
With this model, this is how you'd get all the names of the people who have the same Contact:
MATCH (person:Person)-[:CONTACT_AT]->(contact:Contact)
RETURN contact, COLLECT(person.name) AS names;
And this is how you'd get all the names of the people who have the same Address:
MATCH (person:Person)-[:ADDRESS_AT]->(address:Address)
RETURN address, COLLECT(person.name) AS names;

Is this concept applicable for a graph database?

I have been reading about Graph databases and want to know if this type of structure is applicable to it:
Company > Has user Accounts > Accounts send out facebook posts (which are available to all users)
Up to here - I think this makes sense - yes it would be a good use of Graph. A post has a relationship to any accounts and you can find out the direction both ways - posts for a company and which posts were sent by which users or companies.
However
Users get added and deleted on a daily basis and I need a record store of how many there were at a given time
Accounts are getting results for each post (likes/friends) which I need to store on a daily basis
I need to find out how many likes a company received (on any given day)
I also need to find out how many likes a user received
I need to find out how many likes a user received per post
You would need to store Likes as a group and then date-value - can you even have "sub" properties?
I struggle at this point unless you are storing lots of date-value property lists per node. Is that the way you would do it? If I wanted to find out the later 2 points for example would it be as efficient as a RDBMS?
Here is a very simple example of a Graph data model that seems to cover your stated use cases. (Since nodes can have multiple labels, all Company and User nodes are also Entity nodes -- to simplify the model.)
(:Company:Entity {id:100})-[:HAS_USER]->(:User:Entity {id: 200})
(:Entity)-[:SENT]->(:Post {date: 123, msg: "I like cats!"})
(:Entity)-[:LIKES {date: 234}]->(:Post)
Your use cases:
Users get added and deleted on a daily basis and I need a record store of how many there were at a given time.
How to count all users:
MATCH (u:User)
RETURN COUNT(*);
How to count a company's users:
MATCH (c:Company {id:100})-[:HAS_USER]->(u:User)
RETURN COUNT(*);
I need to find out how many likes a company received (on any given day)
MATCH (c:Company {id: 100})-[:SENT]->(p:Post)<-[:LIKES {date:234}]-()
RETURN COUNT(*)
I also need to find out how many likes a user received
MATCH (u:User {id:200})-[:SENT]->(p:Post)<-[:LIKES]-()
RETURN COUNT(*);
I need to find out how many likes a user received per post
MATCH (u:User {id:200})-[:SENT]->(p:Post)<-[:LIKES]-()
RETURN p, COUNT(*)
You would need to store Likes as a group and then date-value - can you even have "sub" properties?
You do not need to explicitly group likes by date (if that is what you mean). Such "groupings" can be easily obtained by the appropriate query (e.g., in #2 above).

Neo4j first node to meet relationship in a movie model

I have read the Neo4j manual and saw the numerous short examples regarding movie graph. I have also installed it locally and played with the cypher.
Here is the setup:
I have the following nodes: Movies (with name and id, owned by friend), Actors(with name and ids) Directors (with names and id), Genre (with id and name)
Relations are: Actors acted in Movies (1 movie - many actors), Directors directed a movie (1 director per movie but a director can direct many movies), and Movies has several genre "(many to many)
1) Owned by friend I dont know why but following the LOAD CSV example they put USA as a node rather than a property but is there a logical reason why its better to put it as a node rather than a property like i did?
2)
What I want to search is similar to the answer given to this question:
Nearest nodes to a give node, assigning dynamically weight to relationship types
However - I do not have a weight on the relationship and its more of a "go find the first give nodes connected to it"
Given that the "owned by friend" can only be owned by 1 person.
If given movie title "Spider-Man" (which for example purpose is owned by frank) go find the next occurrence of a movie that is owned by John.
So after reading Neo4j I believe that I dont need to specify which relationship is needed to traverse but just go find the next movie that meets my criteria, right?
So Following the above link
MATCH (n:Start { title: 'Spider-Man' }),
(n)-[:CONNECTED*0..2]-(x)
RETURN x
So go to node Spider-Man and go find me X as long as it is connected but I got stump by *0..2 because its the range...what if I just say "go find me the first you that means the own by John"
3) following up to #2 - how do i insert the fitler "own by john" ?
There are a number of things in your question that don't quite make sense. Here's a stab at an answer.
1) Making 'USA' a node rather than a property is useful if you want to search based on country. If 'USA' is a node, you are able to limit your search by starting at the 'USA' node. If you don't care to do this, then it doesn't really matter. It may also save a small amount of space for longer country names to store the name once and link to it via relationships.
2) Your example doesn't match your described graph. I can't really speak to it without a better example.
3) This is probably easy to answer once you improve your example.
OK. Based on the comments to the answer, here's what you need. To find one movie owned by John that is connected via common actors, directors, etc to the movie Spider-man owned by Frank (that is, sub-graphs like (movie)<--(actor)-->(movie) ) you can write:
MATCH (n:Movie {title : 'Spider-Man', owned_by : 'Frank'})<-[*2]->(m:Movie {owned_by : 'John'})
RETURN m LIMIT 1
If you want more responses, alter or remove the LIMIT on the RETURN clause. If you want to allow chains that pass through chains like (movie)<--(actor)-->(movie)<--(director)-->(movie), you can increase the number of relationships matched (the *2) to 4, 6, 8, etc. You probably shouldn't just write the relationship part of the MATCH clause as -[*]-, because this could get into infinite loops.

Resources