Neo4J query to find same data link to different nodes - neo4j

Following is what I created in Neo4j:
Nodes: Customer Names, Customer Address and Customer Contact
Linked these nodes based on common relationships between all three.
I can see all three nodes linked in Neo4j. Contact contain email and phone numbers so some cases customer name node is connected to email address, phone number and address.
In my learning curve I am asked to show how many same contacts are used by different customer names also how many same address used by different customer names. Based on my little experience I tried few queries but couldnt reach to results.
Tried following query -
start n=node(*)
match n-[:CONTACT_AT]-()
return distinct n
CONTACT_AT is the relationship between customer name and Contact (email, phone) node.

Your question does not provide enough information about your data model. To save time, I will assume that it looks something like this (without showing all the properties):
(a:Address)<-[:ADDRESS_AT]-(p:Person {name: '...'})-[:CONTACT_AT]->(c:Contact)
With this model, this is how you'd get all the names of the people who have the same Contact:
MATCH (person:Person)-[:CONTACT_AT]->(contact:Contact)
RETURN contact, COLLECT(person.name) AS names;
And this is how you'd get all the names of the people who have the same Address:
MATCH (person:Person)-[:ADDRESS_AT]->(address:Address)
RETURN address, COLLECT(person.name) AS names;

Related

Neo4j Cypher exclude nodes where a specific relationship is missing

I am trying to implement a fraud detection system in neo4j, where I have a bunch of nodes with person, bank account, kredit card, telephone numbers and addresses.
A basic idea of detecting fraud in bank sytems is someone who has a bank account and a credit card, where his credit card is not linked to his own bank account.
And I cannot figure it out what to do. Because when I try to exclude these nodes with:
WHERE NOT (k)-[:VERKNUEPFT]-(b)
I still get the wrong nodes, but it just hides the VERKNUEPFT node.
Can someone give me the correct way to negate, to exclude every not needed node?
So simply said I need to get following output:
First I filtered out which nodes are needed at all:
MATCH (p:person)-[r:HAT_KONTO]->(b:bankkonto), (p)-[r2:NUTZT_KARTE]->(k:kreditkarte) return p,b,k,r,r2;
which gives me the following:
the nodes below this Hermine and Ron are correct, so I want to exclude everything who are linked to them.
But when I try to do MATCH (p:person)-[r:HAT_KONTO]->(b:bankkonto), (p)-[r2:NUTZT_KARTE]->(k:kreditkarte) WHERE NOT (k)-[:VERKNUEPFT]-(b) return p,b,k,r,r2;
I get the following:
only the bankaccount (the brown one) is missing.
When I test the same code with WHERE instead of WHERE NOT:
MATCH (p:person)-[r:HAT_KONTO]->(b:bankkonto), (p)-[r2:NUTZT_KARTE]->(k:kreditkarte) WHERE (k)-[:VERKNUEPFT]-(b) return p,b,k,r,r2;
I achieve the opposite of what I want to.
I think you need to check, whether all the credit cards held by a person, are linked to any one of his bank accounts. Currently, you are checking if they are linked to a specific bank. Try something along these lines:
MATCH (p:person)-[:HAT_KONTO]->(b:bankkonto)
WITH p, collect(b) AS banks
MATCH (p)-[r2:NUTZT_KARTE]->(k:kreditkarte)
WITH p, banks, collect(k) AS creditCards
WHERE ALL(card IN creditCards WHERE ANY(bank IN banks WHERE (card)-[:VERKNUEPFT]-(bank)))
UNWIND banks AS b
UNWIND creditCards AS k
MATCH (p)-[r:HAT_KONTO]->(b), (p)-[r2:NUTZT_KARTE]->(k)
RETURN p,r,b,r2,k
In the above query, we first collect the banks associated with a person and his/her credit cards into two different collections. Then, we check whether all the credit cards are linked to one of the banks, to filter out the valid users. Then we return their details.

Combining related information within a cypher query

I have created a knowledge with the nodes and relationships pictured. Each person has any number of jobs and skills connected to them and each Job and Skill can have any number of People connected to them. I would like to be able to search for a particular job (e.g. Security Architect) and return a list of all the people who have been employed_as that job and all of the skills that each person is skilled_in. I have created a query hich retrieves these results, however a new line in the query is created for each skill, duplicating the person details each time. This is the query I have which retrieves those results.
MATCH (j:Job {job_title: "Security Architect"})<-[p_rel:employed_as]-(p:Person)-[skilled_in]->(s:Skill) return p,s,p_rel
Is it possible to create a query that returns all of the skill nodes connected to a person as a single list with the details of that person?
Since you need all skills in single line, you can collect all the skills per person.
MATCH (j:Job {job_title: "Security Architect"})<-[p_rel:employed_as]-(p:Person)
-[skilled_in]->(s:Skill)
RETURN p,p_rel, collect(s) as skills_per_person

Cypher query for list pattern

I have a schema which looks like below:
A customer is linked to another customer with a relationship SIMILAR having similarity score.
Example: (c1:Customer)-->(c2:Customer)
An Email node is connected to each customer with relationship MAIL_AT with the following node properties:
{
"active_email_address": "a#mail.com",
"cibil_email_addresses": [
"b#mail.com", "c#mail.com"
]
}
Example: (e1:Email)<-[:MAIL_AT]-(c1:Customer)-[:SIMILAR]->(c2:Customer)-[:MAIL_AT]->(e2:Email)
A Risk node with some risk-related properties (below) and is related to customer with relationship HAS_RISK:
{
"f0_score": 870.0,
"pta_score": 430.0
}
A Fraud node with some fraud-related properties (below) and is related to customer with relationship IS_FRAUD:
{
"has_commited_fraud": true
}
My Objectives:
To find the customers with common email addresses (irrespective of active and secondary)?
My tentative solution:
MATCH (email:Email)
WITH email.cibil_email_addresses + email.active_email_address AS emailAddress, email
UNWIND emailAddress AS eaddr
WITH DISTINCT eaddr AS deaddr, email
UNWIND deaddr AS eaddress
MATCH (customer:Customer)-[]->(someEmail:Email)
WHERE eaddress IN someEmail.cibil_email_addresses + someEmail.active_email_address
WITH eaddress, COLLECT(customer.customer_id) AS customers
RETURN eaddress, customers
Problem: It is taking forever to execute this. Working with lists will take time I understand, however, I'm flexible to change the schema (If suggested). Should I break the email address into separate nodes? If yes, then how do I break cibil_email_addresses into different nodes as they can vary - Should I create two nodes with different cibil email addresses and connect both of them to customer with relationship HAS_CIBIL_EMAIL? (Is this a valid schema design). Also, it is possible, a customer's active_email_address is present in other customer's cibil_email_address. I'm trying to find a synthetic identity attack. PS: If there exists some APOC that can help achieve this and below, do suggest with example.
In production, for a given customer with email addresses, risk values, similarity score, and also given other customers may or may not be tagged with fraud_status, I want to check whether this new person will fall in a fraud ring or not. PS: If I need to use any gds to solve this, please suggest with examples.
If I were to do this same exercise with some other node such as Address which may be partially matching and will be having same list of historical addresses in a list, what should be my ideal approach?
I know, I'm tagging someone in my question, but that person only seems to be active with respect to Cypher on StackOverflow. #cybersam any help?
Thanks.
This should work:
MATCH (e:Email)
UNWIND (e.cibil_email_addresses + e.active_email_address) AS address
WITH address, COLLECT(e) AS es
UNWIND es AS email
MATCH (email)<-[:MAIL_AT]-(cust)
RETURN address, COLLECT(cust) AS customers
The WITH clause takes advantage of the arregating function COLLECT to automatically collect all the Email nodes containing the same address, using address as the grouping key.
You should only ask one question at a time. You have a couple of other questions at the bottom. If you continue to need help with them, please create new questions.

Searching nodes and properties of nodes

I am trying to create a search function for my meetup app which uses Neo4j as the database. Is there a way to search both nodes (Topic, Department, and Title, getting the people that are attached to them) and properties of nodes (first name, last name, username, bio).
The Person node has a relationship to a Title node (via IS_TITLED) and a relationship to Department node (via EMPLOYED_BY) and relationship to Topic nodes (via INTEREST_OF or SKILL_OF)
Also I would like to make sure that the results are distinct for each person so if the person puts in the title of a person and a department and it gets 2 matches, then the person only returns once.
Your question is very broad, but here is an example query that:
Finds all people employed by the Finance department and have the titled "Clerk".
Ensures they are distinct people.
Returns their first name, last name, username, and bio.
MATCH (d:Department)<-[:EMPLOYED_BY]-(p:Person)-[:IS_TITLED]->(t:Title)
WHERE d.name = "Finance" AND t.name = "Clerk"
WITH DISTINCT p
RETURN p.fname AS firstname, p.lname AS lastname, p.username AS username, p.bio AS bio;
Actually I wasn't looking for an entire application. My final solution was to add,update, and remove documents in ElasticSearch when my nodes where added, update, and removed. Then I use ElasticSearch to find results and return a list of node id's. Then I wrote my Cypher query to pull information using IN for the returned id's to produce the results. It seems to work perfectly. Since I couldn't find an integrated solution for syncing Neo4j and ElasticSearch, I use both libraries in my application and just perform the appropriate action on ElasticSearch when the nodes were effected.

Is this concept applicable for a graph database?

I have been reading about Graph databases and want to know if this type of structure is applicable to it:
Company > Has user Accounts > Accounts send out facebook posts (which are available to all users)
Up to here - I think this makes sense - yes it would be a good use of Graph. A post has a relationship to any accounts and you can find out the direction both ways - posts for a company and which posts were sent by which users or companies.
However
Users get added and deleted on a daily basis and I need a record store of how many there were at a given time
Accounts are getting results for each post (likes/friends) which I need to store on a daily basis
I need to find out how many likes a company received (on any given day)
I also need to find out how many likes a user received
I need to find out how many likes a user received per post
You would need to store Likes as a group and then date-value - can you even have "sub" properties?
I struggle at this point unless you are storing lots of date-value property lists per node. Is that the way you would do it? If I wanted to find out the later 2 points for example would it be as efficient as a RDBMS?
Here is a very simple example of a Graph data model that seems to cover your stated use cases. (Since nodes can have multiple labels, all Company and User nodes are also Entity nodes -- to simplify the model.)
(:Company:Entity {id:100})-[:HAS_USER]->(:User:Entity {id: 200})
(:Entity)-[:SENT]->(:Post {date: 123, msg: "I like cats!"})
(:Entity)-[:LIKES {date: 234}]->(:Post)
Your use cases:
Users get added and deleted on a daily basis and I need a record store of how many there were at a given time.
How to count all users:
MATCH (u:User)
RETURN COUNT(*);
How to count a company's users:
MATCH (c:Company {id:100})-[:HAS_USER]->(u:User)
RETURN COUNT(*);
I need to find out how many likes a company received (on any given day)
MATCH (c:Company {id: 100})-[:SENT]->(p:Post)<-[:LIKES {date:234}]-()
RETURN COUNT(*)
I also need to find out how many likes a user received
MATCH (u:User {id:200})-[:SENT]->(p:Post)<-[:LIKES]-()
RETURN COUNT(*);
I need to find out how many likes a user received per post
MATCH (u:User {id:200})-[:SENT]->(p:Post)<-[:LIKES]-()
RETURN p, COUNT(*)
You would need to store Likes as a group and then date-value - can you even have "sub" properties?
You do not need to explicitly group likes by date (if that is what you mean). Such "groupings" can be easily obtained by the appropriate query (e.g., in #2 above).

Resources