My CouchDB database has 3 types of data: A, B, C.
A has a 'b' attribute being an ID to a B, and a name
B has a 'c' attribute being an ID to a C, and a name
C has a name
for instance:
{ _id:"a1", type:"A", name:"aaaaa", b:"b1" }
{ _id:"b1", type:"B", name:"bbbbb", c:"c1" }
{ _id:"c1", type:"C", name:"ccccc" }
I would like to get in one view query all the As, and retreiving the names of its B, and of its B's C (and for instance, I would like to restrict the result to get only the As of which C's name is "cc").
How can I acheive this?
(to get only A and B, the answer is:
map: function (doc) {
if (doc.type == "A") {
emit([doc._id,0])
emit([doc._id,1], { _id: A.b })
}
}
but I have no clue to extend to 2nd relationship)
I am also interested with the answer in the case we have a 'D' class, and 'E' class etc with more nested relationships.
Many thanks!
In a generic way, in CouchDB it's only possible to traverse a graph one level deep. If you need more levels, using a specialized graph database might be the better approach.
There are several ways to achieve what you want in CouchDB, but you must model your documents according to the use case.
If your "C" type is mostly static, you can embed the name in the document itself. Whenever you modify a C document, just batch-update all documents referring to this C.
In many cases it's not even necessary to have a C type document or a reference from B to C. If C is a tags document, for example, you could just store an array of strings in the B document.
If you need C from A, you can also store a reference to C in A, best accompanied with the name of C cached in A, so you can use the cached value if C has been deleted.
If there are only a few instances of one of the document types, you can also embed them directly. Depending on the use case, you can embed B in A, you can embed all As in an array inside of B, or you can even put everything into one document.
With CouchDB, it makes most sense to think of the frequency and distribution of document updates, instead of normalizing data.
This way of thinking is quite different from what you do with SQL databases, but in the typical read-mostly scenarios we have on the web, it's a better trade-off than expensive read queries to model documents like independent entities.
When I model a CouchDB document, I always think of it as a passport or a business letter. It's a single entity that holds valid, correct and complete information, but it's not strictly guaranteed that I am still as tall as in the passport, that I look exactly as in the picture, that I haven't changed my name, or that I have a different address than the one stated on the business letter.
If you provide more information on what you actually want to do with some examples, I will happily elaborate further!
Related
I faced a need to make a strange thing. I have some query which is can’t be changed. It’s a match query for getting record:
MATCH (j:journal) WHERE j.id in [12] RETURN j.`id` AS ID, j.`language` AS LANGUAGE
And I have some node that contains array as property: e.g. can be created like this: create (j:journal {id:12, language:[“English”, “Polish”]})
So, is there any possibility to display this node like two records with the same id, but with different language fields? Like the following:
ID | LANGUAGE
12 | English
12 | Polish
The important thing is that match query can’t be changed at all.
But the node can be changed.
I know that I can add UNWIND keyword for the language field in the source query. But there is a requirement to not to.
I didn’t find something like that in the documentation nor in the internet. I’m not sure if it’s even possible (but consumer wants it). Just I don’t have much experience with neo4j.
I understand that it can sound weird, but I need to understand if it can be implemented this way.
Thanks in advance.
If you can change the DB, you can change it so that each journal node contains a single language (as a scalar value, not in a list). However, this change might break any other queries that you might have.
If this conversion is acceptable, here is a query that should: (a) convert existing journal nodes to have a scalar language value, and (b) create new journal nodes as necessary for the remaining language values. The nodes that are spawned from an original journal node will share the same properties (except for language).
MATCH (j:journal)
WITH j, j.language[1..] AS langs
SET j.language = j.language[0]
WITH j, langs
UNWIND langs AS lang
CREATE (k:journal)
SET k = j, k.language = lang
If a node's language property had N values, you will end up with N nodes, each with the same properties -- except for the language property, which will contain a different language value (as a string). For efficiency, the original node is reused.
lets say i have an entity A and a entity B. The relationship is 1:1. If i delete A then B should also be deleted. I have set this up with a cascade delete rule. Now lets assume i add another entity C which has a relationship 1:1 to B.
I'm not sure how to handle this appropriately. If A and C have the same B should it be the same instance of B? Or should i duplicate the entry ? If it is the same then deleting A can not delete B anymore if C has a reference to it.
Furthermore how can i enforce this 1:1 relationship between A and B ? At the moment when i create the A entity i also create a B entity not checking if it is already in the data base. This does not fail, and causes duplicates of B.
Edit: For a better understanding what i'm trying to achieve:
A must always have a B.
C must also always have a B.
For example lets assume i have an entity Shop (B) and an entity Favorite(A). In the Future i might add another entity which will also use Shop. So if the user creates a Favorite he will give it a name and a Shop. Now if i don't check if this Shop is already in the DB i would create another one. On the other hand if i allow duplicates i can use cascade delete rule and must not be scared that if i delete a Favorite it would leave another Favorite with an empty Shop.
I feel a little lost and i'm not sure whats the best practise in such scenarios. Any help is appreceated.
It depends on what you are trying to do. If a B cannot exist without an A, then the cascade would still apply. However, if C can cause B to still exist, then you should remove the cascade delete.
But this also begs the question as to if any one of them doesn't exist, will the others then not exist. Which may mean that you need to delete both B and C if A gets deleted.
Lets apply this to an example.
A = Body
B = Head
C = Brain
Lets say these are all the things a human needs to be alive. Without a body, the head and brain won't work. This is similarly true for the rest of the parts. So if any of these parts die, the others die.
Ok, now if A and C have the same B, then that should be a unique B, not a duplicate.
Any implementation of CoreData should check for the existence of a entity before creating a new one unless you can guarantee it will only belong to one specific entity. Your A and B must always be unique if you are going to not check for duplicates. There are very few always unique real world values. A specific time would an example of an always unique instance. This second only ever exists right now, and will never exist again, nor has it existed before.
If a new A creates a B, then that is fine. However, if both A and C can have the same B, then you must check for an existing B, otherwise A and C will never be able to share a B. If your A's only have 1 B forever, and nothing else uses B, then you can do this as long as you always create unique A's.
I'm trying to query Book nodes for recommendation by Cypher.
I want to recommend A:Book and C:Book for A:User.
i'm sorry I need some graph to explain this question, but I could't up graph image because my lepletion lacks for upload function.
I wrote query below.
match (u1:User{uid:'1003'})-->(o1:Order)-->(b1:Book)<--(o2:Order)
<--(u2:User)-->(o3:Order)-->(b2:Book)
return b2
This query return all Books(A,B,C,D) dispite cypher's Uniqueness.
I expect to only return A:Book and C:Book.
Is this behavior Neo4j' specification?
How do I get expected return? Thanks, everyone.
environment:
Neo4j ver.v2.0.0-RC1
Using Neo4j Server with REST API
Without the sample graph its hard to say why you get something back when you expected something else. You can share a sample graph by including a create statement that would generate said graph, or by creating it in Neo4j console and putting the link in your question. Here is an example of the latter: console.neo4j.org/r/fnnz6b
In the meantime, you probably want to declare the type of the relationships in your pattern. If a :User has more than one type of outgoing relationships you will be excluding those other paths based on the labels of the nodes on the other end, which is much less efficient than to only traverse the right relationships to begin with.
To my mind its not clear whether (u:User)-->(o:Order)-->(b:Book) means that a user has one or more orders, and each order consists of one or more books; or if it means only that a user ordered a book. If you can share a sample, hopefully that will be clear too.
Edit:
Great, so looking at the graph: You get B and D back because others who bought B also bought D, and others who bought D also bought B, which is your criterion for recommendation. You can add a filter in the WHERE clause to exclude those books that the user has already bought, something like
WHERE NOT (u1)-[:BUY]->()-[:CONTAINS]->(b2)
This will give you A, C, C back, since there are two matching paths to C. It's probably not important to get two result items for C, so you can either limit the return to give only distinct values
RETURN DISTINCT(b2)
or group the return values by counting the matching paths for each result as a 'recommendation score'
RETURN b2, COUNT(b2) as score
Also, if each order only [CONTAINS] one book, you could try modelling without order, just (:User)-[:BOUGHT]->(:Book).
I have 2 million rows in a flat db4o table. A lot of the information is repeated - for example, the first column has only three possible strings.
I could easily break the table into a 4-tier hierarchy (i.e. navigate from root >> symbol >> date >> final table) - but is this worth it from a speed and software maintenance point of view?
If it turns out that it would be cleaner to break the table into a hierarchy, any recommendations on a good method to achieve this within the current db4o framework?
Answers to questions
To actually answer your question, I
would need more information. What kind
of information do you store?
I'm storing objects containing strings and doubles. The hierarchy is exactly, in concept, like a file system with directories, sub-directories, and sub-sub-directories: a single root node contains an array of subclasses, and each sub-class in turn contains further arrays of sub-sub-classes, etc. Here is an example of the code:
// rootNode---|
// sub-node 1----|
// |-----sub-sub-node 1
// |-----sub-sub-node 2
// |-----sub-sub-node 3
// |-----sub-sub-node X (others, N elements)
// sub-node 2----|
// |-----sub-sub-node 1
// |-----sub-sub-node 2
// |-----sub-sub-node 3
// |-----sub-sub-node X (others, N elements)
// sub-node 3----|
// |-----sub-sub-node 1
// |-----sub-sub-node 2
// |-----sub-sub-node 3
// |-----sub-sub-node X (others, N elements)
// sub-node X (others, N elements)
class rootNode
{
IList<subNode> subNodeCollection = new List<subNode>();
string rootNodeParam;
}
class subNode
{
IList<subSubNode> subSubNodeCollection = new List<subSubNode>();
string subNodeParam;
}
class subSubNode
{
string subSubNodeParam;
}
// Now, we have to work out a way to create a query that filters
// by rootNodeParam, subNodeParam and subSubNodeParam.
Ans what
are the access-patterns of your data?
Are reading single objects by a query
/ search. Or are you reading a lot of
objects which are related to each
other?.
I'm trying to navigate down the tree, filtering by parameters as I go.
In general db4o (and other object
databases) are good at navigational
access. This means that you first
query for some objects, and from there
you navigate to related objects. For
example you first query for a
user-object. From there you navigate
to the users home, city, job, friends
etc objects. That kind of access works
in great in db4o.
This is exactly what I'm trying to do, and exactly what works well in db4o if you only have 1-1 mappings between classes and subclasses. If you have 1-to-many by implementing an ArrayList of classes within a class, it can't do a query without instantiating the whole tree - or am I misled on this one?
So in your example in your case the
4-tier hierarchy can work great with
db4o, but only when you can navigate
from the root to the symbol object and
so on. That mean that the root object
has a collection of its
'children'-object
Yes - but is there any way to do a query, if each subNode contains a collection?
As Sam Stainsby already pointed out in his commend, db4o doesn't have the notion of tables. It stored objects and thats db4o's unit of storage. Don't try to think in terms of tables, that doesn't really work with db4o.
As you said, you repeat information, so thats a good candidate to be separated in a other objects, which then can be referenced by other objects. In general I would first design a good domain-model, to be aware of how the data is organized and related to each other. And to think about what kind of data-access-patterns you have. And then try to find out how you can design your classes/object in a way which works with db4o.
To actually answer your question, I would need more information. What kind of information do you store? Ans what are the access-patterns of your data? Are reading single objects by a query / search. Or are you reading a lot of objects which are related to each other?.
In general db4o (and other object databases) are good at navigational access. This means that you first query for some objects, and from there you navigate to related objects. For example you first query for a user-object. From there you navigate to the users home, city, job, friends etc objects. That kind of access works in great in db4o.
So in your example in your case the 4-tier hierarchy can work great with db4o, but only when you can navigate from the root to the symbol object and so on. That mean that the root object has a collection of its 'children'-object
Btw: If you feel that is more natural to think in terms of tables for your data, then I recommend using a relational database. Relations databases are awesome at dealing with tables.
I am having a problem with "same individuals property" in protege, when I run a reasoner (pellet 1.5 or fact++)
Lets take ontology example
thing has class sons A and B, A has sons C and D.
B, C and D have individuals of the same class.
Can't I say a individual C is "same individual" as individual B, and then add also individual D is "same individual" as individual B? Which is true, they have different names, but they are same individual.
Why does it only work when I set individual B has "same individual" of type C or D?
The protege error is "InconsistentOntologyException:Fact++.Kernel: inconsistent Ontology" and pellet says ontology is inconsistent.
EDIT: Seems its a more deep rooted problem, this example works, going to keep checking.
EDIT2: After some more experimenting, seems its a conflict with DataType properties.
They all share a DataType properties with same name. In the example domain of property would be A and range string. Any idea how to solve?
Yeah you solved it - you were confusing labels (what you call things) with identity: an instance of a class is unique (you can attach different labels to it - i.e. call it different things) but the instance itself can only exist once - and in your example you effectively asserted that there are "three instances of the same instance"...which, of course, doesn't make any sense.