I'm using NEO4J 2.0 M6 and I'm trying to find the most connected nodes and list order them by their connections descending. I have tried many snippets from lots of other posts but without success.
The data I have is simple:
create (Account1 { id:123}),
(Account2 { id:456}),
(Account3 { id:789}),
(Account4 { id:101}),
(PERMISSION1 { name: 'ChangeOneThing'}),
(PERMISSION2 { name: 'ChangeAnotherThing'}),
(PERMISSION3 { name: 'ConsumeThePlanet'}),
(Account1)-[:ADDED]->(PERMISSION1),
(Account1)-[:ADDED]->(PERMISSION2),
(Account2)-[:ADDED]->(PERMISSION2),
(Account4)-[:ADDED]->(PERMISSION2),
(Account3)-[:REMOVED]->(PERMISSION3)
What I need as the result is something like the following as I am trying to determine which are the most added permissions in order to creating groupings in a new system.
PermissionName Count
==========================
ChangeAnotherThing 3
ChangeOneThing 1
This will allow me to determine the most popular groups of permissions that have been assigned to accounts which will help me to simplify the current infinitely custom allocations into small groups.
I'm very new to cypher and here is my attempt at getting it to work:
match (account)-[:ADDED]->(permission)<-[:ADDED]-(other_account)
return count(permission) asscore, collect(permission.name) as permissions
order by score desc
But that just gives me:
6 ChangeAnotherThing, ChangeAnotherThing, ChangeAnotherThing, ChangeAnotherThing, ChangeAnotherThing, ChangeAnotherThing
If I understand you right you want something like: take each permission, find all accounts that have added it and count them to see how many times the permission is used. The easiest way to do this is probably to label your accounts and permissions when you create the graph, for instance
CREATE (acct1:Account {id:123}), (acct2:Account {id:456}),
(acct3:Account {id:789}), (acct4:Account {id:101}),
(perm1:Permission {name:'ChangeOneThing'}),
(perm2:Permission {name:'ChangeAnotherThing'}),
(perm3:Permission {name:'ConsumeThePlanet'}),
(acct1)-[:ADDED]->(perm1), (acct1)-[:ADDED]->(perm2),
(acct2)-[:ADDED]->(perm2), (acct4)-[:ADDED]->(perm2),
(acct3)-[:REMOVED]->(perm3)
and then query it like this
MATCH (permission:Permission)<-[:ADDED]-(account:Account)
RETURN permission.name, COUNT(account) AS score
ORDER BY score DESC
You don't have to count or group the permissions, when you return a, count(b) a becomes a grouping key–you get one row for each a and the aggregate value of b.
Related
I'm going to preface this that I am a total database pleb. I have 0 experience with any form of databases so I know that I'm in way over my head.
Background: I do Active Directory consulting for my company so I routinely look at client's group membership of their active directory accounts. Currently, I have a PowerShell script that will run my analytics, however, I'm finding that it takes way too long in larger organizations. I'm thinking "There has to be a better way" so I have jumped into looking at databases. NEO4J seems to be a good possible solution as I should be able to to link a user account or group as a member of another group. However, after browsing documentation and forums, I have no idea how to create those links.
I have two CSVs that I have successfully imported with the following information:
Users = DistinguishedName, SAMACCOUNTNAME, MemberOf
Groups = DistinguishedName, SAMACCOUNTNAME, MemberOf, Members
What I want to do is match a string from all users and groups (DistinguishedName) to a string in the group node's property of members. Members is a concatenated string of all DistinguishedName's (whether user or group). So if a node with a DistinguishedName matches part of a string in a group's "members" property, I want to build a one way relationship like so:
user -[memberof] - > group
The best I could rack my brain on this is the following code but I have no idea if I'm even close:
Match(n)
Match(u:user) WHERE n.Members CONTAINS u.DN
Create (u)-[MS:Memberof]->((match)})
In PowerShell, I know how I would accomplish this (loosely translated to relate to the NEO4J world):
$groups = (all-groups)
$AllUsersAndGroups = (all-objs)
foreach ($line in $groups) {
$line.relationship = $line | where {$_.members -contains $AllUsersAndGrups.DistinguishedName}
}
So at last, I'm stuck right now. I will continue to look into it but I figure I would ask the community as you guys have the experience and stuff.
Here is an example of how you should have imported your data (notice that the redundant Members column is not actually needed):
Import (in batches of 5000, to avoid resource issues) each user, and create a unique relationship to its group:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS FROM "file:///users.csv" AS u
MERGE (u:User {DistinguishedName: u.DistinguishedName, SAMACCOUNTNAME: u.SAMACCOUNTNAME})
MERGE (g:Group {DistinguishedName: u.MemberOf})
MERGE (u)-[:Memberof]->(g);
Import each group, and create a unique relationship to its parent group, if any:
USING PERIODIC COMMIT 5000
LOAD CSV WITH HEADERS FROM "file:///groups.csv" AS g1
MERGE (:Group {DistinguishedName: g1.DistinguishedName, SAMACCOUNTNAME: g1.SAMACCOUNTNAME})
MERGE (g2:Group {DistinguishedName: g1.MemberOf})
MERGE (g1)-[:Memberof]->(g2);
Good morning,
I want to build a structure in Neo4J where I can handle my users and groups (kind of ACL). The idea is to have for each user and for each group a node with all the details. The groups shall become a graph where a root group will have sub-groups that can have also sub-groups without limit. The relation will be -[:IS_SUBGROUP_OF]- - so far nothing exciting. Every user will be related to a group with -[:IS_MEMBER_OF]- to have a clear assignment. Of course a user can be a member of 1 or more groups. Some users will have a different relation like -[:IS_LEADER_OF]- to identify teamlead of the groups.
My tasks:
Assignment: I can query each member of a group with a simple query, I can even query members of the subgroups using the current logged in and asking user:
MATCH (d1:Group:Local) -- (c:User)
MATCH (d:User) -[:IS_MEMBER_OF|IS_LEADER_OF]- (g:Group:Local)-[:IS_SUBGROUP_OF*0..]->(d1)
WHERE c.login = userLogin
RETURN DISTINCT d.lastname, d.firstname
I get every related user to every group of the current user and below (subgroups). Maybe you have a hint how I cna improve the query or the model.
Approval
Here I am stucked as I want to have all users of the current group from the querying user and all members of all subgroups - except the leader of the current group. The reason behind is that a teamlead shall not be able to approve actions for himself but though for every other member of his group and all members of subgroups including their teamleads.
I tried to use the relations -[:IS_LEADER_OF]- to exclude them but than I loose also the teamleads of the subgroups. Does anyone has an idea how I would either change the model or how I can query the graph to get all users except the teamlead of the current group?
Thanks for your time,
Balael
* EDIT *
I think I am getting close, I just need to understand the results of those both queries:
MATCH (d:User) -- (g:Group) WHERE g.uuid = "xx"
RETURN d.lastname, d.firstname
Returns all user in this group no matter what relationship (leader / member)
MATCH (d:User) -- (g:Group), (g)--(c:User{uuid:"yy"})
RETURN d.lastname, d.firstname
Returns all user of that group except the user c. I would have expected to get c as well in the list with d-users as c is part of that group and should be found with (d:User).
I do not understand the difference between both queries, maybe someone has a hint for me?
You can simplify your query slightly (however this should not have an impact on performance):
MATCH (d:User) -[:IS_MEMBER_OF|IS_LEADER_OF]- (g:Group:Local)-[:IS_SUBGROUP_OF*0..]->(d1:Group:Local)--(c:User{login:"userlogin"})
RETURN DISTINCT d.lastname, d.firstname
Don't completely understand your question, but I assume you want to make sure that d1 and c are not connected by a IS_LEADER_OF relationship. If so, try:
MATCH (d:User) -[:IS_MEMBER_OF|IS_LEADER_OF]- (g:Group:Local)-[:IS_SUBGROUP_OF*0..]->(d1:Group:Local)-[r]-(c:User{login:"userlogin"})
WHERE type(r)<>'IS_LEADER_OF'
RETURN DISTINCT d.lastname, d.firstname
following up on * EDIT * in the question
In a MATCH you specify a path. By definition a path does not use the same relationship twice. Otherwise there is a danger to run into infinite recursion. Looking at the second query in the "EDIT" section above: the right part matches yy's relationship to the group whereas the left part matches all user related to this group. To prevent multiple usage of the same relationship the left part does not hit use yy
my relationships look like this
A-[:CHATS_WITH]->B - denotes that the user have sent at least 1 mesg to the other user
then messages
A-[:FROM]->message-[:SENT_TO]->B
and vice versa
B-[:FROM]->message-[:SENT_TO]->A
and so on
now i would like to select all users a given user chats with together with the latest message between the two.
for now i have managed to get all messages between two users with this query
MATCH (me:user)-[:CHATS_WITH]->(other:user) WHERE me.nick = 'bazo'
WITH me, other
MATCH me-[:FROM|:SENT_TO]-(m:message)-[:FROM|:SENT_TO]-other
RETURN other,m ORDER BY m.timestamp DESC
how can I return just the latest message for each conversation?
Taking what you already have do you just want to tag LIMIT 1 to the end of the query?
The preferential way in a graph store is to manually manage a linked list to model the interaction stream in which case you'd just select the head or tail of the list. This is because you are playing to the graphs strengths (traversal) rather than reading data out of every Message node.
EDIT - Last message to each distinct contact.
I think you'll have to collect all the messages into an ordered collection and then return the head, but this sounds like it get get very slow if you have many friends/messages.
MATCH (me:user)-[:CHATS_WITH]->(other:user) WHERE me.nick = 'bazo'
WITH me, other
MATCH me-[:FROM|:SENT_TO]-(m:message)-[:FROM|:SENT_TO]-other
WITH other, m
ORDER BY m.timestamp DESC
RETURN other, HEAD(COLLECT(m))
See: Neo Linked Lists and Neo Modelling a Newsfeed.
Does it possible to have an order by "property" with a where clause and now the "index/position" of the result?
I mean, when using order for sorting we need to be able to know the position of the result in the sort.
Imagine a scoreboard with 1 million user node, i do an order by on user node.score with a where "name = user_name" and i wan't to know the current rank of the user. I do not find how to do this using order by ...
start game=node(1)
match game-[:has_child_user]->user
with user
order by user.score
with user
where user.name = "my_user"
return user , "the position in the sort";
the expected result would be :
node_user | rank
(i don't want to fetch one million entries at client side to know the current rank/position of a node in the ORDER BY!)
This functionality does not exist today in Cypher. Do you have an example of what this would look like in SQL? Would the below be something that fits the bill? (just a sketch, not working!)
(your code)
start game=node(1)
match game-[:has_child_user]->user
with user
order by user.score
(+ this code)
with user, index() as rank
return user.name, rank;
If you have more thoughts or want to start hacking on this please open an issue at https://github.com/neo4j/neo4j/issues
For the time being there is a work around that you can do:
start n=node(0),rank_node=node(1)
match n-[r:rank]->rn
where rn.score <= rank_node.score
return rank_node,count(*) as pos;
For live example see: http://console.neo4j.org/?id=bela20
I realise this may not be ideal usage, but apart from all the graphy goodness of Neo4j, I'd like to show a collection of nodes, say, People, in a tabular format that has indexed properties for sorting and filtering
I'm guessing the Type of a node can be stored as a Link, say Bob -> type -> Person, which would allow us to retrieve all People
Are the following possible to do efficiently (indexed?) and in a scalable manner?
Retrieve all People nodes and display all of their names, ages, cities of birth, etc (NOTE: some of this data will be properties, some Links to other nodes (which could be denormalised as properties for table display's and simplicity's sake)
Show me all People sorted by Age
Show me all People with Age < 30
Also a quick how to do the above (or a link to some place in the docs describing how) would be lovely
Thanks very much!
Oh and if the above isn't a good idea, please suggest a storage solution which allows both graph-like retrieval and relational-like retrieval
if you want to operate on these person nodes, you can put them into an index (default is Lucene) and then retrieve and sort the nodes using Lucene (see for instance How do I sort Lucene results by field value using a HitCollector? on how to do a custom sort in java). This will get you for instance People sorted by Age etc. The code in Neo4j could look like
Transaction tx = neo4j.beginTx();
idxManager = neo4j.index()
personIndex = idxManager.forNodes('persons')
personIndex.add(meNode,'name',meNode.getProperty('name'))
personIndex.add(youNode,'name',youNode.getProperty('name'))
tx.success()
tx.finish()
'*** Prepare a custom Lucene query context with Neo4j API ***'
query = new QueryContext( 'name:*' ).sort( new Sort(new SortField( 'name',SortField.STRING, true ) ) )
results = personIndex.query( query )
For combining index lookups and graph traversals, Cypher is a good choice, e.g.
START people = node:people_index(name="E*") MATCH people-[r]->() return people.name, r.age order by r.age asc
in order to return data on both the node and the relationships.
Sure, that's easily possible with the Neo4j query language Cypher.
For example:
start cat=node:Types(name='Person')
match cat<-[:IS_A]-person-[born:BORN]->city
where person.age > 30
return person.name, person.age, born.date, city.name
order by person.age asc
limit 10
You can experiment with it in our cypher console.