i am currently working on a project in which i have to implement a notifications system. I am a really starte with graphs dbs, so i dont know which would be the best approach to implement this. I have been thinking in two options:
1-Creating notifications nodes, relating them to the user with a relation type pending or read. When i insert them i insert them like pending and when the user read them, i change the relation type to read.
2-Creating notifications nodes, relate them to the user and add some property to the relation..."status" (pending,read). Then add an index to that property.
I dont if i am well oriented, i would appreciate if you could point me in what direction.
Thanks in advance. Rodrigo
Rodrigo,
it depends a bit on what amount of stuff you are going to expect, and how you are going to ask for this. I think something like
USER---READ--->TODAY---->Status1
|
----UNREAD--->TODAY---->Status2
Would both let you group status for both READ/UNREAD and some property regarding time or otherwise, when a lot of read status are starting to accumulate over time, while still letting you ask traversals over them ...
Would that work?
/peter
Related
So I've just worked through the tutorial and I'm unclear about a few things. The main one, however, is how do you decide when something is a relationship and when it should be a Node?
For example, in the Movies Database,there is a relationship showing who acted in which film. A property of that relationship is the Role. BUT, what if it's a series of films? The role may well be constant between films (say, Jack Ryan in The Hunt for Red October, Patriot Games etc.)
We may also want to have some kind of Character bio, which would obviously remain constant between movies. Worse, the actor may change from one movie to another (Alec Baldwin then Harrison Ford.) There are many others like this (James Bond, for example).
Even if the actor doesn't change (Main roles in Harry Potter) the character is constant. So, at what point would the Role become a node in its own right? When it does, can I have a 3-way relationship (Actor-Role-Movie)? Say I start of with it being a relationship and then, down the line, decide it should've been a node, is there a simple way to go through the database and convert it?
No there is no way to convert your datamodel. When you start your own Database first take time to find a fitting schema. There is no ideal way to create a schema and also there are many different models fitting to the same situation without being totally wrong.
My strategy is to put less information to the relationship itself. I only add properties that directly concern the relationship and store all the other data in the nodes. Also think of properties you could use for traversing the graph. For example you might need some flags or even different labels for relationships even they more or less are the same. The apoc.algo.aStar is only including relationshiptypes you want (you could exclude certain nodes by giving them a special relationshiptype). So keep that in mind that you take a look at procedures that you might use later.
Try to create the schema as simple as possible and find a way to stay consistent in terms of what things are nodes and what deserves a relationship. Dont mix it up.
Choose a design that makes sense for you! (device 1)-[cable]-(device 2) vs (device 1)-[has cable]-(cable)-[has cable]-(device 2) in this case I'd prefer the first because [has cable] wouldn't bring anymore information. Irrespective to what I wrote above I would have a lot of information in this [cable] relationship but it totally makes sense for me because I wouldnt want to search in a device node for cable information.
For your example giving the role a own node is also valid way. For example if you want to espacially query which actors had the same role in common I'll totally go for giving the role a extra node.
Summary:
Think of what you want to do with the data and choose the easiest model.
In our graph database, I'm looking to store misc data that a user does but isn't really related to anything such as changing a password or updating a username. (there are about 20 other use cases that we have)
I attached two possibilities below but I don't know which one is better if I'm looking to eventually do queries such as how many peopled the password yesterday or who changed the password yesterday.
There are several more options and as always it depends. What are your queries and what is your most likely entrypoint into the graph ? For example :
Those are just three possibilities and the model really depends on how you want to query your data. What else could I have done ?
I could have done away with the Event nodes all together and put the properties on a relationship between User and EventType. Hard to use that relationship as an entry point into the graph though.
I could have added a Date node which could be an entry point into the graph (or maybe an index on eventDate is sufficient).
I could have ...
There is no single right (or wrong) answer. The better choice is often the one that reflects your reality/business the best.
Hope this helps.
Regards,
Tom
We have items in our app that form a tree-like structure. You might have a pattern like the following:
(c:card)-[:child]->(subcard:card)-[:child]->(subsubcard:card) ... etc
Every time an operation is performed on a card (at any level), we'd like to record it. Here are some possible events:
The title of a card was updated by Bob
A comment was added by Kate mentioning Joe
The status of a card changed from pending to approved
The linked list approach seems popular but given the sorts of queries we'd like to perform, I'm not sure if it works the best for us.
Here are the main queries we will be running:
All of the activity associated with a particular card AND child cards, sorted by time of the event (basically we'd like to merge all of these activity feeds together)
All of the activity associated with a particular person sorted by time
On top of that we'd like to add filters like the following:
Filter by person involved
Filter by time period
It is also important to note that cards may be re-arranged very frequently. In other words, the parents may change.
Any ideas on how to best model something like this? Thanks!
I have a couple of suggestions, but I would suggest benchmarking them.
The linked list approach might be good if you could use the Java APIs (perhaps via an unmanaged extension for Neo4j). If the newest event in the list were the one attached to the card (and essentially the list was ordered by the date the events happened down the line), then if you're filtering by time you could terminate early when you've found an event which is earlier than the specified time.
Attaching the events directly to the card has the potential to lead you down into problems with supernodes/dense nodes. It would be the simplest to query for in Cypher, though. The problem is that Cypher will look at all of them before filtering. You could perhaps improve the performance of queries by, in addition to placing the date/time of the event on the event node, placing it on the relationships to the node ((:Card)-[:HAS_EVENT]->(:Event) or (:Event)-[:PERFORMED_BY]->(:Person)). Then when you query you can filter by the relationships so that it doesn't need to traverse to the nodes.
Regardless, it would probably be helpful to break up the query like so:
MATCH (c:Card {uuid: 'id_here')-[:child*0..]->(child:Card)
WITH child
MATCH (child)-[:HAS_EVENT]->(event:Event)
I think that would mean that the MATCH is going to have fewer permutations of paths that it will need to evaluate.
Others are welcome to supplement my dubious advice as I've never really dealt with supernodes personally, just read about them ;)
First of all I request you to please bear with me and I apologise if this is a silly question.
I have a table like this.
create table users (
cname text,
--anything else like counter or timestamp
primary key (cname)
);
All I need to do is implement a stack like structure with that table.
A number of insert and delete operations will be there.
Problems faced:
1) I tried using timestamps. I successfully inserted using dateof(now()) but I just wanted to delete the last 1 record (???). Also, by making cname,t (where t is of type timestamp) as primary key, I'm having redundant cnames which I don't want.
2) I tried using counter but I felt it was complicated. Also, I may have multiple threads or clients performing the insert/delete operations. So, I thought leave it.
3) Also, I will not be knowing the value of cname. So, queries which require key in where clause are impossible. So i think I need to change the primary key to some other variable.
Please help me move forward. I'm finding it a bit difficult as there are no good books available for cql to learn from.
Stacks and Queues are antipatterns in cassandra ( http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets ) - you can implement them, but you need to understand the internals quite well before you'll be able to do it without making a horrible mistake. It's fairly clear that you're probably not at that level yet.
Is there a reason you think you need to use cassandra for a stack?
Yes, the first part of your primary key (known as your partition key) needs to be something you know, so if you dont know cname, it's probably not a great fit for your primary key. What DO you know about the data/usage patterns? Maybe time buckets?
I've been researching the Tinkerpop stack for quite a while. I think I have a good idea of what it can do and what databases it works well with. I've got a couple of different databases I'm thinking about right now, but haven't decided on a definite. So I've decided to write my code purely to the interfaces, and not take into account any implementation right now. Out of the databases I'm looking at, they implement TransactionalGraph and KeyIndexableGraph. I think that's good enough for what I need, but I have just one question.
I have different 'classes' of vertices. Using Blueprints, I believe that's best representable by having a field in each vertex containing the class name. Doing that, I can do something like graph.getVertices("classname", "User") and it would give me all of the user vertices. And since the getVertices function specifies that an implementation should make use of indexes, I'm guaranteed to get a fast lookup (if I index that field).
But let's say that I wanted to retrieve a vertex based on two properties. The vertex must have className=Users and username=admin. What's the best way to go about finding that single vertex? And is it possible to index over both of those properties, even though not all vertices will have a username field?
FYI - The databases I'm currently thinking of are OrientDB, Neo4j and Titan, but I haven't decided for sure yet. I'm also currently planning to use Gremlin if that helps at all.
Using a "class" or a "type" for vertices is a good way to segment them. Doing:
graph.createKeyIndex("classname",Vertex.class);
graph.getVertices("classname", "User");
is a pretty common pattern and should generally yield a fast lookup, though iterating an index of tens of millions of users might not be so great (if you intend to grow a particular classname to very big size). I think that leads to the second part of your question, in regards to doing a two property lookup.
Taking your example on the surface, the two element lookup would be something like (using Gremlin):
g.V('classname',"User").has('username','admin')
So, you narrow the vertices to just "User" vertices with a key index and then filter those for "admin". But, I'd model this differently. It would be even less expensive to simply do:
graph.createKeyIndex("username",Vertex.class);
graph.getVertices("username", "admin");
or in Gremlin:
g.V('username','admin')
If you know the username you want, there's no better/faster way to model this. You really only need the classname if you want to iterate over all "User" vertices. If you just want to find one (or a set of vertices with that username) then key indexing on that property is the better way.
Even if I don't create a key index on it, I still include a type or classname property on all vertices. I find it helpful in global operations where I may or may not care about speed, but just need an answer.
graph.getVertices() will iterate through all vertexes and look for ones with that property if you do not have the auto-index turned on in your graph implementation. If you already have data and cannot just turn on the auto-indexer, you should use is index = indexableGraph.getIndex() and then index.get('classname', 'User')
It's possible to perform a query over multiple objects, but without specifics, it's hard to say. For Neo4j they use Lucene, which means that query() will take a lucene query, such as className:Users AND username:admin, but I cannot speak for the others.
Yeah of those DB is good for playing with, I personally found neo4j to be the easiest, and as long as you understand their licensing structure, you shouldn't have any problems using them.