Node state tracking / logging using Neo4j - neo4j

I'm exploring potential use cases for neo4j, and I find that the relationship model is great, but I'm curious if the database can support something along the lines of a business transaction log.
For instance, a video rental store:
Customer A rents Video A on 01/01/2014
Customer A returns Video A on 01/20/2014
Customer B rents Video A on 01/25/2014
Customer B returns video A on 02/15/2014
Customer C rents Video A on 03/10/2014
etc...
The business requirement would be to track all rental transaction relationships relating to the Video A node.
This seems to be technically possible. Would one create a new relationship for every time that a new rental occurs? Are there better ways to approach this? Is this a misuse of the technology?

Nice! This is the exact use case that led me to develop FlockData (github link). FD uses Neo4J to track event type activity against a domain document (Rental in your example). Then use Tags to create Nodes that represent Meta Data associated with the domain doc (Movie/Person). You have an event node for each change in state of the Rental. Couple of graphs over here on LinkedIn showing "User Created", "User Approved" and "User Audited".
FD uses 3 databases to achieve its goals - Neo4j for the network of relationships, KV store for the bulky data (Redis or Riak) and ElasticSearch to let users find their Business Context Document (the Rental) via free text.
In terms of your specific question exercise caution with nodes that have a lot of relationships. Checkout this article on modelling dates. Peter Neubauer has a similar article somewhere in the Neo4j docs.

I'd look at it depending on what you're trying to get out of it. If you're looking to develop a recommendation engine, or see the relationships between users and/or movies, a graphDB is a pretty natural solution. If you're looking at tracking the state changes of Video A over time, a Temporal database is modeled for that (http://en.wikipedia.org/wiki/Temporal_database). For a straight up transactional system, a traditional relational database will work easily. Personally, I think you'll have better options with a graphDB. In your example, you would have 3 consumer nodes, 1 video node, 3 relationships of type :RENTS and two of :RETURNS. You'd want to make sure that your property model supports the same user re-renting the same movie (store the date in an array, not a single value). Just some thoughts...

Related

data model for notification in social network?

I build a social network with Neo4j, it includes:
Node labels: User, Post, Comment, Page, Group
Relationships: LIKE, WRITE, HAS, JOIN, FOLLOW,...
It is like Facebook.
example: A user follow B user: when B have a action such as like post, comment, follow another user, follow page, join group, etc. so that action will be sent to A. Similar, C, D, E users that follow B will receive the same notification.
I don't know how to design the data model for this problem and I have some solutions:
create Notification nodes for every user. If a action is executed, create n notification for n follower. Benefit: we can check that this user have seen notification, right? But, number of nodes quickly increase, power of n.
create a query for every call API notification (for client application), this query only get a action list of users are followed in special time (24 hours or a 2, 3 days). But Followers don't check this notification seen or yet, and this query may make server slowly.
create node with limited quantity such as 20, 30 nodes per user.
Create unlimited nodes (include time of action) on 24 hours and those nodes has time of action property > 24 hours will be deleted (expire time maybe is 2, 3 days).
Who can help me solve this problem? I should chose which solution or a new way?
I believe that the best approach is the option 1. As you said, you will be able to know if the follower has read or not the notification. About the number of notification nodes by follower: this problem is called "supernodes" or "dense nodes" - nodes that have too many connections.
The book Learning Neo4j (by Rik Van Bruggen, available for download in the Neo4j's web site) talk about "Dense node" or "Supernode" and says:
"[supernodes] becomes a real problem for graph traversals because the graph
database management system will have to evaluate all of the connected
relationships to that node in order to determine what the next step
will be in the graph traversal."
The book proposes a solution that consists in add meta nodes between the follower and the notification (in your case). This meta node should got at most a hundred of connections. If the current meta node reaches 100 connections a new meta node must be created and added to the hierarchy, according to the example of figure, showing a example with popular artists and your fans:
I think you do not worry about it right now. If in the future your followers node becomes a problem then you will be able to refactor your database schema. But at now keep things simple!
In the series of posts called "Building a Twitter clone with Neo4j" Max de Marzi describes the process of building the model. Maybe it can help you to make best decisions about your model!

Neo4j data modeling for branching/merging graphs

We are working on a system where users can define their own nodes and connections, and can query them with arbitrary queries. A user can create a "branch" much like in SCM systems and later can merge back changes into the main graph.
Is it possible to create an efficient data model for that in Neo4j? What would be the best approach? Of course we don't want to duplicate all the graph data for every branch as we have several million nodes in the DB.
I have read Ian Robinson's excellent article on Time-Based Versioned Graphs and Tom Zeppenfeldt's alternative approach with Network versioning using relationnodes but unfortunately they are solving a different problem.
I Would love to know what you guys think, any thoughts appreciated.
I'm not sure what your experience level is. Any insight into that would be helpful.
It would be my guess that this system would rely heavily on tags on the nodes. maybe come up with 5-20 node types that are very broad, including the names and a few key properties. Then you could allow the users to select from those base categories and create their own spin-offs by adding tags.
Say you had your basic categories of (:Thing{Name:"",Place:""}) and (:Object{Category:"",Count:4})
Your users would have a drop-down or something with "Thing" and "Object". They'd select "Thing" for instance, and type a new label (Say "Cool"), values for "Name" and "Place", and add any custom properties (IsAwesome:True).
So now you've got a new node (:Thing:Cool{Name:"Rock",Place:"Here",IsAwesome:True}) Which allows you to query by broad categories or a users created categories. Hopefully this would keep each broad category to a proportional fraction of your overall node count.
Not sure if this is exactly what you're asking for. Good luck!
Hmm. While this isn't insane, think about the type of system you're replacing first. SQL. In SQL databases you wouldn't use branches because it's data storage. If you're trying to get data from multiple sources into one DB, I'd suggest exporting them all to CSV files and using a MERGE statement in cypher to bring them all into your DB at once.
This could manifest similar to branching by having each person run a script on their own copy of the DB when you merge that takes all the nodes and edges in their copy and puts them all into a CSV. IE
MATCH (n)-[:e]-(n2)
RETURN n,e,n2
Then comparing these CSV's as you pull them into your final DB to see what's already there from the other copies.
IMPORT CSV WITH HEADERS FROM "file:\\YourFile.CSV" AS file
MERGE (N:Node{Property1:file.Property1, Property2:file.Property2})
MERGE (N2:Node{Property1:file.Property1, Property2:file.Property2})
MERGE (N)-[E:Edge]-(N2)
This will work, as long as you're using node types that you already know about and each person isn't creating new data structures that you don't know about until the merge.

Does Neo4j support constraints based on a domain-model?

Short question
Does Neo4j support constraints based on a domain-model?
Explanation
In the basic tutorial, it says "Please keep this picture at hand all the time. It details the domain-model for this tutorial." (https://stack.versal.com/api2/assets/fdc05cea-e18b-44ea-8ba9-e119d7a8f872).
But is there any way to check that data stored into the graph respect this domain-model?
For relational databases, we have "create" instructions to build the domain-model and "insert" instructions to store data in compliance with this domain-model.
For graph database in Neo4j, I only found "create" instructions where we can specify a type (that would be part of the domain-model).
What I need to do
I need to create a domain-model that prevent the creation of nodes which are not compliants with the domain-modeln for example:
the node type must be in the domain-model
a type of association can only link nodes with specific types
Example
With the movie domain-model coming from the tutorial (https://stack.versal.com/api2/assets/fdc05cea-e18b-44ea-8ba9-e119d7a8f872) :
A node can only be of type Person or Movie
A Movie can't have outgoing edges
A DIRECTED or ACTED_IN relationship can't link two Persons
...
Is this possible in Neo4j?
Or do I have to create checkers on the model?
You will have to create checkers of the model or an API that guarantees that only nodes matching the model are added.
Some things that you describe will be added in Neo4j in the future but it has not been decided when.
But I saw a presentation of the http://structr.org application framework today that allows you to model a schema with types, properties and relationships with cardinalities.

How can I port a relational database to Neo4j?

I am playing around with Neo4j but trying to get my head around the graph concepts. As a learning process I want to port a small Postgres relational database schema to Neo4j. Is there any way I can port it and issues "equivalent" relational queries to Neo4j?
Yes, you can port your existing schema to a graph database. Keep in mind that this is not necessarily the best model for your data, but it is a starting point.
How easy it is depends a lot on the quality of your existing schema.
The tables corresponding to entities in an entity-relationship-diagram define your types of nodes. In the upcoming neo4j 2.0, you can labels them with the name of the entity to make a lookup easier. In older versions you can use an index or a manual label property.
Assuming a best case, where all your relationships between data is modelled using foreign keys, any 1:1 relationship between nodes can be identified and ported next.
For tables modelling n:m relationships, identify the corresponding nodes and add a direct relationship between them.
So as an example assume tables Author[id, name, publisher foreign key], Publisher[id, name] and Book[id, title] and written_by[author foreign key, book foreign key].
Every row in Author, Publisher and Book becomes a node.
Every Author node gets a relationship to the publisher identified by the foreign key relationship.
For every row in written_by you add a relationship between the Author node and Book node referenced
For queries in neo4j I recommend cypher due to its expressiveness. A (2.0) query looking for books by some author would look like:
MATCH (author:Author)-[:written_by]-(book:Book)
WHERE author.name='Hugh Laurie'
RETURN book.title
You actually have several options at hand:
use the Talend connector for Neo4J
export your schema+data in CSV files consumable by the batch importer
or you can do it programmatically
I'm afraid not. The relational data model and the graph data model are two different ways of modelling a real-world domain. It requires a human brain (at least as of 2013) to understand the domain in order to model it.
I suggest that you take a piece of paper and capture, using circles and arrows, what your entities are (nodes) and how they relate to each other (relationships). Then, have a look at that piece of paper. Voila, your new Neo4j data model.
Then, take a query that you want to be answered and try to figure out how you would do that without a computer, just by tracing your nodes and relationships with a finger on that piece of paper. Once you've figured that out, translate what you've done to a Cypher query.
Have a look at neo4j.org, there are plenty of examples.
Check this out:
The musicbrainz -> neo4j
https://github.com/redapple/sql2graph/tree/master/examples/musicbrainz
Neo4j Sql-importer
https://github.com/peterneubauer/sql-import
Good Luck!
This tool does exactly that.
Import any relational db into neo4j
https://github.com/jexp/neo4j-rdbms-import

Neo4j, Which is better: multiple relationships or one with a property?

I'm new to neo4j, and I'm building a social network. For the sake of this question, my graph consists of user and event nodes with relationship(s) between them.
A user may be invited, join, attend or host an event, and each is a subset of the one before it.
Is there any benefit to / should I create multiple relationships for each status/state, or one relationship with a property to store the current state?
Graph-type queries are more easily/efficiently done on relationship types than properties, from what I understand.
How about one relationship, but a different relationship type?
You can query on several types of relationships with pipes using Cypher (in case you have other relationships to the event that you don't want to pick up in queries).
Update--adding console example: http://console.neo4j.org/?id=woe684
Alternatively, you can just leave the old relationships there and not have to build the slightly more complicated queries, but that feels a bit wasteful for this use case.
When possible, choosing different relationship types over a single type qualified by properties can have a significant positive performance impact when querying the graph. The former approach is aways at least 2x faster than the latter. When data is in high-level cache and the graph is queried using native Java API, the first approach is more than 8x faster for single-hop traversals.
Source: http://graphaware.com/neo4j/2013/10/24/neo4j-qualifying-relationships.html

Resources