How to create random relationships between nodes? - neo4j

I'm trying to create random transaction between bank accounts. I have created the following query:
//Create transactions
CALL apoc.periodic.iterate("
match (a:BANK_ACCOUNT)
WITH apoc.coll.randomItem(collect(a)) as sender
return sender", "
MATCH (b:BANK_ACCOUNT)
WHERE NOT sender = b
WITH apoc.coll.randomItem(collect(b)) as receiver
MERGE (sender)-[r:HAS_TRANSFERED {time: datetime()}]->(receiver)
set r.amount = rand()*1000",
{batchSize:100, parallel:false});
I would assume that it would create 100 random transactions between random bank accounts. Instead it creates 1 new bank account and 1 new relationship. What am I doing wrong and what should I do?
Thanks for your help !

The following query uses apoc.coll.randomItems to get 200 different random accounts at one time (which is much faster than getting one random account 200 times):
MATCH (ba:BankAccount)
WITH apoc.coll.randomItems(COLLECT(ba), 200) AS accts
WHERE SIZE(accts) > 1
UNWIND RANGE(0, SIZE(accts)/2*2-1, 2) AS i
WITH accts[i] AS sender, accts[i+1] AS receiver
CREATE (sender)-[:TRANSFERED_TO {time: datetime()}]->(receiver)
Notes:
This query uses CREATE instead of MERGE because it is unlikely that a TRANSFERED_TO relationship already exists with the current time as the time value. (You can choose to use MERGE anyway, if duplication is still possible.)
The WHERE SIZE(accts) > 1 test avoids errors when there are not at least 2 accounts.
SIZE(accts)/2*2-1 calculation prevents the RANGE function from generating a list index (i) that exceeds the last valid index for a sender account.

Related

Auto increment id Neo4j to retrieve elements in insert order

Recently, I am experimenting Neo4j. I like the idea but I am facing a problem that I have never faced with relational databases.
I want to perform these inserts and then return them exactly in the insertion order.
Insert elements:
create(p1:Person {name:"Marc"})
create(p2:Person {name:"John"})
create(p3:Person {name:"Paul"})
create(p4:Person {name:"Steve"})
create(p5:Person {name:"Andrew"})
create(p6:Person {name:"Alice"})
create(p7:Person {name:"Bob"})
While to return them:
match(p:Person) return p order by id(p)
I receive the elements in the following order:
Paul
Andrew
Marc
John
Steve
Alice
Bob
I note that these elements are not returned respecting the query insertion order (through the id function).
In fact the id of my elements are the following:
Marc: 18221
John: 18222
Paul: 18208
Steve: 18223
Andrew: 18209
Alice: 18224
Bob: 18225
How does the Neo4j id function work? I read that it generates an auto incremental id but it seems a little strange his mechanism. How do I return items respecting the query insertion order? I thought about creating a timestamp attribute for each node but I don't think it's the best choice
If you're looking to generate sequence numbers in Neo4j then you need to manage this yourself using a strategy that works best in your application.
In ours we maintain sequence numbers in key/value pair nodes where Scope is the application name given to the sequence number range, and Value is the last sequence number used. When we generate a node of a given type, such as Product, then we increment the sequence number and assign it to our new node.
MERGE (n:Sequence {Scope: 'Product'})
SET n.Value = COALESCE(n.Value, 0) + 1
WITH n.Value AS seq
CREATE (product:Product)
SET product.UniqueId = seq
With this you can create as many sequence numbers you need just by creating sequence nodes with unique scope names.
For more examples and tests see the AutoInc.Neo4j project https://github.com/neildobson-au/AutoInc/blob/master/src/AutoInc.Neo4j/Neo4jUniqueIdGenerator.cs
The id of Neo4j is maintained internally, which your business code should not depend on.
Generally it's auto incrementally, but if there is delete operation, you may reuse the deleted id according to the Reuse Policy of Neo4j Server.

Neo4j: Maintaining node counts

I'm investigating the use of Neo4j to detect potentially fraudulent card transactions in near real time. I receive details of a customer and a card they've just used from our on-line systems. What I'm trying to do here is create new nodes for the customer and card if they don't exist, then establish the relationship between them.
Whenever the customer uses the card I want to set the time the card was last used, in addition, if this is the first time this customer-->card relationship has been seen, update totals of the number of cards the customer is associated with and the number of customers associated with the card.
The Cypher below seems to work, however I think it will re-evaluate the counts every time the relationship is seen, not just on the create. Is it possible to use the ON MATCH and ON CREATE in this statement to limit the unnecessary processing?
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
I'm running this from Python (using py2neo), I also want to return something back that will allow me to kick off a bespoke dijkstra based search of the neighborhood. Any ideas how I'd return some variable based on whether this was a new or existing relationship?
There is no need you to even have the card_ct or card_count properties.
Since neo4j 2.1, getting a count of the number of relationships of a specific type from a node is very efficient. So, every time you need a count, just use SIZE(()-[:has_card]->(node)) or SIZE((node)-[:has_card]->()).
How about something like this. Create a counter on a MATCH and if the counter is greater than zero then it is an existing relationship. Otherwise it is a new relationship.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"45uIic..."})
MERGE (c)-[r:has_card]->(a)
ON MATCH SET r.num = coalesce(r.num, 0) + 1
set r.last_transaction = "30-NOV-2016 07:58:42"
set a.card_ct = size(()-[:has_card]->(a))
set c.card_count = size((c)-[:has_card]->())
RETURN
CASE
WHEN r.num > 0 THEN false
ELSE true
END as new_relationship
Here's the Cypher I've ended up with, thanks to Dave Bennett for his suggestion. I also realised that I don't need to initiate any further analysis if only 1 customer is associated with 1 card so I've excluded this as well.
MERGE (c:customers {customer_id:"12345678"})
MERGE (a:cards {card_hash:"BFgn..."})
MERGE (c)-[r:has_card]->(a)
ON CREATE SET a.card_scheme = "VISA DEBIT"
, a.card_ct = size(()-[:has_card]->(a))
, c.card_count = size((c)-[:has_card]->())
ON MATCH SET r.ind = 1
SET r.last_transaction = "06-Dec-2016 11:19:13"
RETURN CASE WHEN exists(r.ind)
AND a.card_ct + c.card_count > 2
THEN false
ELSE true END as new_relationship

In- and excluding nodes in a cypher query

Good morning,
I want to build a structure in Neo4J where I can handle my users and groups (kind of ACL). The idea is to have for each user and for each group a node with all the details. The groups shall become a graph where a root group will have sub-groups that can have also sub-groups without limit. The relation will be -[:IS_SUBGROUP_OF]- - so far nothing exciting. Every user will be related to a group with -[:IS_MEMBER_OF]- to have a clear assignment. Of course a user can be a member of 1 or more groups. Some users will have a different relation like -[:IS_LEADER_OF]- to identify teamlead of the groups.
My tasks:
Assignment: I can query each member of a group with a simple query, I can even query members of the subgroups using the current logged in and asking user:
MATCH (d1:Group:Local) -- (c:User)
MATCH (d:User) -[:IS_MEMBER_OF|IS_LEADER_OF]- (g:Group:Local)-[:IS_SUBGROUP_OF*0..]->(d1)
WHERE c.login = userLogin
RETURN DISTINCT d.lastname, d.firstname
I get every related user to every group of the current user and below (subgroups). Maybe you have a hint how I cna improve the query or the model.
Approval
Here I am stucked as I want to have all users of the current group from the querying user and all members of all subgroups - except the leader of the current group. The reason behind is that a teamlead shall not be able to approve actions for himself but though for every other member of his group and all members of subgroups including their teamleads.
I tried to use the relations -[:IS_LEADER_OF]- to exclude them but than I loose also the teamleads of the subgroups. Does anyone has an idea how I would either change the model or how I can query the graph to get all users except the teamlead of the current group?
Thanks for your time,
Balael
* EDIT *
I think I am getting close, I just need to understand the results of those both queries:
MATCH (d:User) -- (g:Group) WHERE g.uuid = "xx"
RETURN d.lastname, d.firstname
Returns all user in this group no matter what relationship (leader / member)
MATCH (d:User) -- (g:Group), (g)--(c:User{uuid:"yy"})
RETURN d.lastname, d.firstname
Returns all user of that group except the user c. I would have expected to get c as well in the list with d-users as c is part of that group and should be found with (d:User).
I do not understand the difference between both queries, maybe someone has a hint for me?
You can simplify your query slightly (however this should not have an impact on performance):
MATCH (d:User) -[:IS_MEMBER_OF|IS_LEADER_OF]- (g:Group:Local)-[:IS_SUBGROUP_OF*0..]->(d1:Group:Local)--(c:User{login:"userlogin"})
RETURN DISTINCT d.lastname, d.firstname
Don't completely understand your question, but I assume you want to make sure that d1 and c are not connected by a IS_LEADER_OF relationship. If so, try:
MATCH (d:User) -[:IS_MEMBER_OF|IS_LEADER_OF]- (g:Group:Local)-[:IS_SUBGROUP_OF*0..]->(d1:Group:Local)-[r]-(c:User{login:"userlogin"})
WHERE type(r)<>'IS_LEADER_OF'
RETURN DISTINCT d.lastname, d.firstname
following up on * EDIT * in the question
In a MATCH you specify a path. By definition a path does not use the same relationship twice. Otherwise there is a danger to run into infinite recursion. Looking at the second query in the "EDIT" section above: the right part matches yy's relationship to the group whereas the left part matches all user related to this group. To prevent multiple usage of the same relationship the left part does not hit use yy

Return all users given user chats with and the latest message in conversation

my relationships look like this
A-[:CHATS_WITH]->B - denotes that the user have sent at least 1 mesg to the other user
then messages
A-[:FROM]->message-[:SENT_TO]->B
and vice versa
B-[:FROM]->message-[:SENT_TO]->A
and so on
now i would like to select all users a given user chats with together with the latest message between the two.
for now i have managed to get all messages between two users with this query
MATCH (me:user)-[:CHATS_WITH]->(other:user) WHERE me.nick = 'bazo'
WITH me, other
MATCH me-[:FROM|:SENT_TO]-(m:message)-[:FROM|:SENT_TO]-other
RETURN other,m ORDER BY m.timestamp DESC
how can I return just the latest message for each conversation?
Taking what you already have do you just want to tag LIMIT 1 to the end of the query?
The preferential way in a graph store is to manually manage a linked list to model the interaction stream in which case you'd just select the head or tail of the list. This is because you are playing to the graphs strengths (traversal) rather than reading data out of every Message node.
EDIT - Last message to each distinct contact.
I think you'll have to collect all the messages into an ordered collection and then return the head, but this sounds like it get get very slow if you have many friends/messages.
MATCH (me:user)-[:CHATS_WITH]->(other:user) WHERE me.nick = 'bazo'
WITH me, other
MATCH me-[:FROM|:SENT_TO]-(m:message)-[:FROM|:SENT_TO]-other
WITH other, m
ORDER BY m.timestamp DESC
RETURN other, HEAD(COLLECT(m))
See: Neo Linked Lists and Neo Modelling a Newsfeed.

Trying to find the most connected node in cypher

I'm using NEO4J 2.0 M6 and I'm trying to find the most connected nodes and list order them by their connections descending. I have tried many snippets from lots of other posts but without success.
The data I have is simple:
create (Account1 { id:123}),
(Account2 { id:456}),
(Account3 { id:789}),
(Account4 { id:101}),
(PERMISSION1 { name: 'ChangeOneThing'}),
(PERMISSION2 { name: 'ChangeAnotherThing'}),
(PERMISSION3 { name: 'ConsumeThePlanet'}),
(Account1)-[:ADDED]->(PERMISSION1),
(Account1)-[:ADDED]->(PERMISSION2),
(Account2)-[:ADDED]->(PERMISSION2),
(Account4)-[:ADDED]->(PERMISSION2),
(Account3)-[:REMOVED]->(PERMISSION3)
What I need as the result is something like the following as I am trying to determine which are the most added permissions in order to creating groupings in a new system.
PermissionName Count
==========================
ChangeAnotherThing 3
ChangeOneThing 1
This will allow me to determine the most popular groups of permissions that have been assigned to accounts which will help me to simplify the current infinitely custom allocations into small groups.
I'm very new to cypher and here is my attempt at getting it to work:
match (account)-[:ADDED]->(permission)<-[:ADDED]-(other_account)
return count(permission) asscore, collect(permission.name) as permissions
order by score desc
But that just gives me:
6 ChangeAnotherThing, ChangeAnotherThing, ChangeAnotherThing, ChangeAnotherThing, ChangeAnotherThing, ChangeAnotherThing
If I understand you right you want something like: take each permission, find all accounts that have added it and count them to see how many times the permission is used. The easiest way to do this is probably to label your accounts and permissions when you create the graph, for instance
CREATE (acct1:Account {id:123}), (acct2:Account {id:456}),
(acct3:Account {id:789}), (acct4:Account {id:101}),
(perm1:Permission {name:'ChangeOneThing'}),
(perm2:Permission {name:'ChangeAnotherThing'}),
(perm3:Permission {name:'ConsumeThePlanet'}),
(acct1)-[:ADDED]->(perm1), (acct1)-[:ADDED]->(perm2),
(acct2)-[:ADDED]->(perm2), (acct4)-[:ADDED]->(perm2),
(acct3)-[:REMOVED]->(perm3)
and then query it like this
MATCH (permission:Permission)<-[:ADDED]-(account:Account)
RETURN permission.name, COUNT(account) AS score
ORDER BY score DESC
You don't have to count or group the permissions, when you return a, count(b) a becomes a grouping key–you get one row for each a and the aggregate value of b.

Resources