Neo4j data modeling query - neo4j

I am working on a project (platform) where I am using neo4j to make connections between users. There are following different ways where user can connect to other users on the platform.
When a user logs in with Facebook, I get his mutual friends who are already using our platform. There I make a connection in my graph database by creating a new node (current user) and make connections with all existing nodes whom he knows through facebook.
One user also connects with other user if both are staying in same society/community. So the use case is, once a user updates his residential address (society name, city) than I make a query in graph db and get all nodes who also stay in the same society and create this new user with those nodes with relationship name "Same society".
Same was user might be connected with other user if both users study in same college or school. I make a connection between two nodes with relationship "Same college/school".
What is the best way to model the above problem in neo4j? If I do a query in DB to get all the relationship types and shortest path of all the relationships between given two nodes, which model will be optimized for this type of query?

Simply create a relationship between users
(user)-[:FRIEND_OF]-(user)
Relationships in neo4j are always directional, but you can ignore the direction because, in this case, it doesn't make much sense. There is no performance penalty for traversing a relationship "backwards".
Create a node representing your society/community and link all users living there to that node. This has several benefits:
when you add new user to a community you only create one relationship (compared to linking the user directly to all users, which is N relationships), same goes for removing user from community.
you can create different relationships between the user and community
(user)-[:LIVES]-(community1)
(user)-[:WORKS]-(community2)
This is essentially the same as no. 2. You can differentiate between types of communities (schools, etc..) by having different labels/properties.
Finding a shortest path between two nodes in this model will give you their closest connection, the kind of "know via something". You may limit the path by relationship type/community type etc.

Related

Firebase -Can Shard Data be shared between multiple Realtime Databases

Using the Realtime Database it says here that if you want to scale beyond 200,000 simultaneous connections that you can create/shard another database. It also says:
Each query only runs against a single database instance. Realtime
Database doesn't support queries across database instances.
No sharing or duplication of data across database instances (or
minimal sharing or duplication).
Each app instance only connects to one database at any given moment.
Let's say in first database I have a Posts ref, a Users ref, and a Search Posts ref with 100K user objects, 200K post objects, and 200K search objects. I now decide to create/shard another database with the same exact refs.
When the next x amount of users signs up, if their User, Post, and Search Posts refs are in the new shard database, does that mean that they won't be able to access users or search those user's posts from the first database? Also vice versa the users from the first database won't be able to have access to users or their posts in the second database?
The point of sharding is to load balance your connections and is not related to quantity of data.
The RTDB makes no decisions about where data is stored, you do. One server would contain users, another would contain posts.
You would run user queries against the server with users and posts queries against the serve with posts.
All you're doing is pointing your app toward the server you want to query before running the query.
In other words, there would be no reason to add users to server 1 and then add more users to server 2 as the quantity of data doesn't matter.

Neo4J Multi Tenancy and Role Based Access to Nodes

I am trying to define a user management and permissions model for Neo4j. I have a web application (Angular 2) that connects to Neo4j via an API (KOANEO4J). Neo4j is the only database or persistent storage that the application uses. Through the application a user can add/edit/delete content which uses the API to carry out these instructions in Neo4j by running Cypher Statements. Up to now I have not worried about supporting multiple users but as a next step I am starting to think about this.
The product will be used by multiple different companies and each company will have multiple users so I need some way to support this. The model I am considering in Neo4J is as follows:
An "Orgaization" is represented by a node and it can have 1 or more "Organization Catalogs". All of the nodes belonging to that catalog will be children of one of the "Organization Catalogs".
Each user will also be represented by a node in the database. They will belong to an Organisation. They will have certain access permissions on an Organization Catalog identified by a an edge.
I am looking for some advice on whether or not this is an appropriate model to follow or if there are any examples or documents that describe how to achieve this in Neo4j.
If I do implement this model then would it be better to model the permissions as seperate nodes so a user is connected to a permission node (e.g. Read Only Access) that is then connected to the Organization Catalog.
Any suggestions on how I would actually get the API to work with this type of model. I'm sure I can pass the User Id to Neo4j as part of each query and then filter the results to show only nodes the user has access to but this doesn't seem like a very elegant solution - it also means that all of the security would be dependant on carefully written Cypher queries that don't leak data that a user isnt supposed to access.
Thanks a lot
I am looking for some advice on whether or not this is an appropriate
model to follow or if there are any examples or documents that
describe how to achieve this in Neo4j.
The answer for this question is: it depends. Remember that when modelling a graph database you should consider the queries that are asked to the database. If this model fits the queries that you are asking to the database then this model is appropriated, otherwise, not. Take a look in the Chapter 5 (Graphs in the Real World) of the book Graph Databases (by Ian Robison, Jim Webber and Emil Eifrem. Available for download here). This chapter shows the modelling process of an Authorization and Access Control system in Neo4j. Can be enlightening and helpful to you.
If I do implement this model then would it be better to model the
permissions as seperate nodes so a user is connected to a permission
node (e.g. Read Only Access) that is then connected to the
Organization Catalog.
Again, it depends. Do it if the Permission entity has connection to others entities of your application besides an User and an Organization Catalog. Otherwise I believe that your permission can be modeled as a relationship between an user and an organization catalog.
Any suggestions on how I would actually get the API to work with this
type of model. I'm sure I can pass the User Id to Neo4j as part of
each query and then filter the results to show only nodes the user has
access to but this doesn't seem like a very elegant solution - it also
means that all of the security would be dependant on carefully written
Cypher queries that don't leak data that a user isnt supposed to
access.
Maybe is a good idea add another layer of software between your AngularJS client app and the Neo4j database. This way in this new layer of software (a Node.js application, for example) you can implement a access control system, then verifiy if the authenticated user can access the resource that is being requested.

Neo4j: Convenient Backend for Network Navigation, Editing and Search

I'm a sociology PhD student and I'm trying to use Neo4j to manually build the social network of the political elite in a given country. I say "manually" because I will need to input all the data manually as I come across it in my readings.
For this I would need an interface that allows me to navigate the network, as well as search and edit my database conveniently. Crucially, that would include the capacity to search for node and relationship attributes, and edit the database in real time when in graph visualization mode.
It looks like the default Neo4j admin dashboard only allows you to search by node and relationship numbers, and doesn't allow the editing of the database when in graph visualization mode. Am I right? If so, is there an alternative interface that will allow me to do this? I looked into Neoeclipse but I'm not sure it's the right tool for the job. (I also haven't been able to properly load my database on it)
Thanks in advance for the help!
JB
Which version are you looking at? Neo4j 2.0 is a really convenient UI for what you want to do.
You can easily find by property:
MATCH (user:User {name="Peter"}) return user
or
MATCH (user:User {name="Peter"})-[:KNOWS]->(other) return user,other
And you can store these queries as favorites so you don't have to type them more than once.
And you can explore the graph by double clicking nodes.
For the visualization, if you click once on each node you can see its properties and also configure which property is shown in the graph.

Rails: Multiple databases, same schema

I'm in the middle of a fictional scenario project where I have allowed multiple users for a company to log in, create records, and so on, who all connect to the one database. They can all records absence records, attendance records, and so on.
What I want to do however, is use this same schema but expands this to allow several companies to have their own databases using the same schema. So each company will have their own data, but all companies use the same data model. In other words all company's can create absence records, but they each only have access to their own absence records that they created themselves.
How can I achieve this?
All I need is two or three files for this, I'm not going commercial with it in case you guys think I'm cutting corners at someone else's expense!
Something as simple as an if-else that decides which file to use would be very useful to me, so if such a line of code exists please let me know.
I think you are doing it wrong (unless you have a really good reason to have a database for each company), because it seems like you are repeating your data model over and over while introducing unnecessary complexity to your code.
Try to have all the companies in one DB/tables with having separated by the company_id.
Ex: data structure would be as follows
companies table
id
name
users table
id
user_name
company_id
However if you really want to connect to multiple databases, check this SO question.

How do social networking websites compute friend updates?

Social networking website probably maintain tables for users, friends and events...
How do they use these tables to compute friends events in an efficient and scalable manner?
Many of the social networking sites like Twitter don't use an RDBMS at all but a Message Queue application. A lot of them start out with a already present application like RabbitMQ. Some of them get big enough they have to heavily customize or build their own. Twitter is in the process of doing this for the second time.
A message queue application works by holding messages from one service for one or more other services. For instance say service Frank is publishing messages to a queue foo. Joe and Jill are subscribed to Franks foo queue. the application will keep track of whether or not Joe or Jill have recieved the messages and once every subscriber to the queue has recieved the message it discards it. Frank fires messages and forgets about it. Joe and Jill ask for messages from foo and get whatever messages they haven't gotten yet. Joe and Jill do whatever they need to do with the message. Perhaps keeping it around perhaps not.
The message queue application guarantees that everyone who is supposed to get the message can and will get the message when they request them. The publisher can send the messages confident that subscriber can get them eventually. This has the benefit of being completely asynchronous and not requiring costly joins.
EDIT: I should mention also that usually the storage for these kind of things at high scale are heavily denormalized. So Joe and Jill may be storing a copy of the exact same message. This is considered ok because it helps the application scale to billions of users.
Other reading:
http://www.rabbitmq.com/
http://qpid.apache.org/
The mainstay data structure of social networking sites is the graph. On facebook the graph is undirected (When you're someone's friend, they're you're friend). On twitter the graph is directed (You follow someone, but they don't necessarily follow you).
The two popular ways to represent graphs are adjacency lists and adjacency matrices.
An adjacency list is simply a list of edges on the graph. Consider a user with an integer userid.
User1, User2
1 2
1 3
2 3
The undirected interpretation of these records is that user 1 is friends with users 2 and 3 and user 2 is also friends with user 3.
Representing this in a database table is trivial. It is the many to many relationship join table that we are familiar with. SQL queries to find friends of a particular user are quite easy to write.
Now that you know a particular user's friends, you just need to join those results to the updates table. This table contains all the user's updates indexed by user id.
As long as all these tables are properly indexed, you'd have a pretty easy time designing efficient queries to answer the questions you're interested in.
Travis wrote a great post on this ,
Activity Logs and Friend Feeds on Rails & pfeed
For the small scale doing a join on users.friends and users.events and query caching is probably fine but does slow down pretty quickly as friends and events grow. You could also try an event based model in which every time a user creates an event an entry is created in a join table (perhaps called "friends_events"). Thus whenever a user wants to see what events their friends have created they can simply do a join between their own id and the friends_events table and find out. In this way you avoid grabbing all a users with friends and then joining their friends with the events table.

Resources