Is it possible in graph sql to require all inbound relationships to be satisfied when selecting a node? - neo4j

So I'm playing around with graph sql as a possible solution to my problem. Essentially it involves creating groups of users -- like described below.
There are three employees: Jerry, Kerry, and Larry
There are two positions: Stocker, and Cashier
There are two locations: Wichita, and Topeka
And there are four groups: Topeka Stockers, Topeka Employees, All Cashiers, and Wichita Cashiers.
The Topeka stockers group doesn't have any members, as there are no employees who are both in the Topeka office and stockers -- The group requires an inbound connection from both the position and office types.
The Topeka Employees group has Jerry and Kerry as members, as it only requires an inbound connection from office types.
The All Cashiers group has Kerry and Larry as members, as it only requires an inbound connection from position types.
The Wichita Cashiers group has Larry as a member, as it requires inbound connections from both the position and office types.
Is it possible to issue queries of the type "Which groups is larry a part of" which dynamically determine the number of inbound connections for a given group and determine if there is a path from larry to those groups on all inbound connections?

Related

How to list users who are counted towards the Crowd License using Rest API

Requirement: I need to derive no of users who are counted towards license for Crowd.
Apps structure in crowd:
Currently we have 2 applications defined in crowd namely App1 & App2.
Also we have 2 directories created as well, dir1 and dir2.
Both the directories are mapped to both the applications in the same order.
Now I have created two groups "grp1" & "grp2", one in each directory respectively and added some users in each group.
Now in "Who can authenticate" section of "app1", I have mapped "grp1" under
"dir1" and under "dir2", "grp2" has been mapped.
The same goes for "app2" as well, in "who can authenticate" section for "app2", I have mapped grp1" under "dir1" and under "dir2", "grp2" has been mapped.
Now I need to fetch the no of users who are counted towards license from the above setup using rest API's.
Can anyone list out any Rest end point available in crowd for achieving this requirement or maybe can even point out the approach to be used using the existing crowd Rest Api's.
Any help would be appreciated.
Given the logic of how many users count towards your Crowd license should be - the sum of the distinct users (by the field mapped as username in your directories) in both grp1 and grp2, the REST endpoints you'd likely need are:
/rest/usermanagement/1/group/child-group/direct?groupname=grp1
/rest/usermanagement/1/group/child-group/direct?groupname=grp2
If group nesting is enabled/used, then you'll also want:
/rest/usermanagement/1/group/child-group/nested?groupname=grp1
/rest/usermanagement/1/group/child-group/nested?groupname=grp2
Get users from both, merge and de-dupe.
Refs:
- https://www.atlassian.com/licensing/crowd#serverlicenses-5
- https://docs.atlassian.com/atlassian-crowd/3.3.0/REST/#usermanagement/1/group
CCM

Twilio as a proxy for many-to-many SMS conversations

What is the best way to proxy marketplace messaging using SMS?
User Model:
each conversation has owner_id and renter_id, if a message is received from one it should be proxied to the other.
If the owner is connected to many conversations, what is the best way to make sure the messages are directed to the proper recipient?
Update:
It looks like twilio recommends purchasing a phone number for each conversation.
This would require owning N phone numbers where N is greater than the conversations grouped by unique user/recipient.
For example with Airbnb data model, would need to know the owner with the largest number of unique renters... This seems like a lot of potential overhead. please correct me if i'm wrong.
This concept will definitely require multiple Twilio numbers if you want to give a friction less experience (no PINs to enter ) , but you will only ever need to have as many numbers as people who a single user can contact.
This is explained in more detail here . And you only need to work out a starting number and rest can be dynamic .
Say, if you maximum number of property any owner owns is N and he rents out on all 365 days to different renters , it means the owner has N*365 renters in their "address book", you would only ever need 365N numbers, even if you had 100,000 users. If based on historical data , you could work out maximum of N and maximum of rental days ( say M) , you have the required phone numbers = N*M . This could be the starting point and doesnt have to be a fixed constant value .
As a fail safe - add a handler to when you cross a threshold - say 90% of your number pool of N*M numbers , then use the Twilio REST API to add numbers dynamically to this pool .

Address dimension or not?

My team is debating internally whether or not we should be creating a separate dimension of address information. The use case is a warehouse for a mail marketing agency, so address is quite important for a multitude of reasons.
We have a couple of pieces of address information flowing in (like Bank address, Customer Address (Our Client's customers), Mailing List Address (or Manifests), And Client Address. We might also get information in bits and pieces from other information that we might need to tie to a specific customer based on address comparisons.
We also do geocoding on our addresses to augment, standardize and validate our addresses that come in.
In total, we are storing the following fields for any given address:
DeliveryLine1
DeliveryLine2
LastLine
DeliveryPointBarcode
StreetNumber
ApartmentNumber
ApartmentUnitType
StreetName
StreetSuffix
Locality
Region
ZipCode
ZipCodePlusFour
DeliveryPoint
DeliveryPointCheckpointDigit
Latitude
Longitude
RecordType
ZipType
CountyFIPS
CarrierRoute
ResidentialDeliveryIndicator
Precision
DPV
Vacant
Active
EWS
thats 27 fields in total.
My colleague is of the opinion that address should go into each dimension (Customer, Bank, Client, Manifest). While I agree that in simple cases where we store Address1, Address2, City, State, Zip it would make sense, but we store a significant amount of added information about an address, with more bits and pieces being added later on (potentially). I make the contention that something like this would be better suited as a separate dimension. Any thoughts?
Looking from dimensional modelling point of view, your fact tables should answer to this question. If your [mail marketing] facts relates to addresses then go ahead and make Address as a separate dimension. I mean, if you do [a mail marketing] to Banks, to Customers, to Mailing List Addresses and Clients and want to analyze information based on Geo information (that is on Address) then it should be created as a separated dimension. However, if you [usually] mail market only your CLients and use an address for other purpose, i.e to find nearly Customer, Banks,etc, then I don't see much value to make an Address as a dimension. In essence, if your facts are related to Addresses to the same level as targets (Banks, Customers, Mailing List Addresses, Clients) then it should be a dimension. If that means nothing but just an attribute of Bank, Customer, Mailing List Address or Client then no need to go with a dimension.

graph modeling approach for node/edge user access control

Are there sets of best practices to approach how to model data in a graph database (I am considering arangodb right now but the question would apply to other platforms)? Here is a practical case to illustrate my question:
Assuming we are creating a centralised contact list for users. Each user has contacts but some contacts could be common to users e.g. John knows Mary, and Marc knows Mary. I would thus have 3 nodes (John, Mary and Marc) but John should only see his relationship to Mary, not Marc's relationship to Mary
So how should a full graph be designed in order to support user access to their information?
Option 1: Create 1 graph per user. That way, I know exactly who can see what (I could for example prefix all my collections with the user id). That would be simple but would duplicate a lot of data (e.g. if I put all my family in the db, my brother will do too, creating twice the same data, in different graphs)
Option 2: Create 1 general graph with Contact nodes, plus User nodes. I would have the contact John, Mary and Marc connected, but the User node representing John, would be linked to the Contact nodes John and Mary only. That way I would know to get only the contact nodes that are connected to the User node I am focusing on.
The problem is that edges cannot be linked to the User node (I cannot have an edge going from a node to an edge...can I?). So I would have to add an attribute of user_id to all the edges in order to only fetch the ones relevant to the current user.
This is slightly nicer as I do not have to duplicate nodes, but I would still have to duplicate edges as they would be user specific
Option 3: Do it SQL like with a Rights table, maintaining a list of Contact ids along with what user can see what Node and what Edge (heavy on joins)
Options 4: ???
As in everything, there are many ways to reach a solution but I was wondering what was considered best practice to balance cleanliness of approach and performance for insertion/deletion...knowing that performance might be platform dependent
i would suggest an Option 4:
First i would not distinguish between User and Contact Nodes, but all of them should be Contact Nodes.
If you create a new User you basically create a new Contact for him (or use an existing one) and connect your Applications Authentication to this specific Contact.
Then you can use directed edges to create the contact list for a user.
Say you have two users John and Mary, than John can add Mary to his contact list, but Mary would not recognize. If she wants to add John this means you will add a second edge.
If you want to have symmetrical contacts only (if John adds Mary to his list, he should automatically appear in her list) you simply ignore this direction in your queries.
If you now want to get the contacts for John this can be done by selecting the Neighbors of John.
In ArangoDB this can be realized with two collections, say Contact and Knows, where Knows holds the edges.
The following code pasted into arangosh creates your situation described above:
db._create("Contact");
db._createEdgeCollection("Knows");
db.Contact.save({_key: "John", mail: "john#example.com"});
db.Contact.save({_key: "Mary", mail: "mary#somewhere.com"});
db.Contact.save({_key: "Marc", mail: "marc#somewhereelse.com"});
db.Knows.save("Contact/John", "Contact/Mary", {});
db.Knows.save("Contact/Marc", "Contact/Mary", {});
To query the contact list for user John:
db._query('RETURN NEIGHBORS(Contact, Knows, "John", "outbound")').toArray()
Should give Mary as result, no information about Marc.
If you do not want to join Contacts and User Accounts as i suggested you could also separate them in different collections, in this case you have to slightly modify the edges and the query:
db.Knows.save("User/John", "Contact/Mary", {});
db.Knows.save("User/Marc", "Contact/Mary", {});
db._query('RETURN NEIGHBORS(Users, Knows, "John", "outbound")').toArray()
should give the same result.
Edit:
Regarding your question in Option 2:
In ArangoDB it is actually possible to point edges to other edges, however build in graph functionality will now consider the edges pointed to as if they were nodes. This means they do not follow their direction automatically. But you can use these resulting edges in further AQL statements and continue the search with AQL features.

neo4j data modelling of an organization

I am thinking on modelling an organization in neo4j.
Organization has a core team, and different divisions.
There are internal groups within organization.
There are groups in organization through which external people interact.
I think there is no concept of sub node or node within node(using which we could have represented an org.).
Usually what is the best approach to represent in neo4j in this scenario?
Thanks
I would start out modelling it in the way you describe the domain above, so you would have a domain like:
(org:Organization), (team:Team), (div:Division), (group:Group)
And then interconnect them in the way you describe their relationships:
(org)-[:CORE_TEAM]->(team),
(org)-[:DIVISION]->(div),
(org)-[:INTERNAL_GROUP]->(group),
(org)-[:EXTERNAL_GROUP]->(group)
Depending on the use case for internal/external groups, you may want to add more general relationships, and have the rels above denote specific connections, so you could have:
(team)-[:BELONGS_TO]->(org),
(div)-[:BELONGS_TO]->(org),
(group)-[:BELONGS_TO]->(org)
It all depends on what your domain case is, like what questions you'd like to ask the data.

Resources