Neo4j: which is faster, indexed node property or relationship property? - neo4j

What is faster/better way to model, searching for a node with an indexed property, or having a single ROOT node with lots of ChildOf relationships, each with a relationship property equal to the index property and starting the search from ROOT and traversing the relationships that have the correct relationship property? Assume the key being sought is unique.

My understanding is that the current version of Neo4j (2.2.3) uses the built-in indexing features of Neo4j (as of version 2.x) when you declare an index on the label.property combination you wish to use in a predicate. With relationship properties, the indexing does not use the newer indexing scheme. You can only use the old legacy indexing for relationship properties, which is not as fast.
See the note on this page.

I think this is the wrong way to think about this question; you should model the data in the way that's more natural for the domain.
It's hard to answer which will be faster because you haven't specified things like how many valid values the index would have in it, the total number of nodes, and so on. In any case, if you're trying to express some kind of semantic relationship like ChildOf you're almost certainly better off with the node and relationships. You should consider storing the ID of one node as a property value of another node to be a major anti-pattern to be avoided.
If on the other hand, the property is say, gender of a person, M/F, and you have 1,000,000 people, then you end up with two "index nodes", each with 500,000 relationships, that's not going to be a good idea.
In general, neo4j is set up to traverse relationships fast, so in general you'll be better off exploiting relationships. But there are a lot of exceptions to that which depend on your domain's semantics, and the cardinality of your attribute values, so YMMV.

Related

Modeling recursive breakdown structures in graph database

For recursive breakdown structures, is it better to model as ...
a. Group HAS Subgroup... or
b. Subgroup PART_OF Group ?? ....
Some neo4j tutorials imply model both (the parent_of and child_of example) while the neo4j subtype tutorials imply that either will work fine (generally going with PART-OF).
Based on experience with neo4j, is there a practical reason for choosing one or the other or use both?
[UPDATED]
Representing the same logical relationship with a pair of relationships (having different types) in opposite directions is a very bad idea and a waste of time and resources. Neo4j can traverse a single relationship just as easily from either of its nodes.
With respect to which direction to pick (since we do not want both), see this answer to a related question.

Is there a benefit to implementing singletons in Neo?

My business requirement says I need to add an arbitrary number of well-defined (AKA not dynamic, not unknown) attributes to certain types of nodes. I am pretty sure that while there could be 30 or 40 different attributes, a node will probably have no more than 4 or 5 of them. Of course there will be corner cases...
In this context, I am generically using 'attribute' as a tag wanted by the business, and not in the Neo4J sense.
I'll be expected to report on which nodes have which attributes. For example, I might have to report on which nodes have the "detention", "suspension", or "double secret probation" attributes.
One way is to simply have an array of appropriate attributes on each entity. But each query would require a search of all nodes. Or, I could create explicit attributes on each node. Now they could be indexed. I'm not seriously considering either of these approaches.
Another way is to implement each attribute as a singleton Neo node, and allow many (tens of thousands?) of other nodes to relate to these nodes. This implementation would have 10,000 nodes but 40,000 relationships.
Finally, the attribute nodes could be created and used by specific entity nodes on an as-needed basis. In this case, if 10,000 entities had an average of 4 attributes, I'd have a total of 50,000 nodes.
As I type this, I realize that in the 2nd case, I still have 40,000 relationships; the 'truth' of the situation did not change.
Is there a reason to avoid the 'singleton' implementation? I could put timestamps on the relationships. But those wouldn't be indexed...
For your simple use case, I'd suggest an approach you didn't list -- which is to use a node label for each "attribute".
Nodes can have multiple labels, and neo4j can quickly iterate through all the nodes with the same label -- making it very quick and easy to find all the nodes with a specific label.
For example:
MATCH (n:Detention)
RETURN n;

Neo4j.rb: autoincrement relationship attribute?

Is it possible to autoincrement an ActiveRel attribute? In contrast to ActiveRecord, it doesn't seem that ActiveNode/ActiveRel support autoincrement attributes out-of-the-box.
I considered using before_save to manually generate an id. However, it appears that it's not possible to order relationships (to find the previous highest id).
How does one implement autoincrementing ids? (I know Neo4j.rb generates UUIDs but this application requires a separate incremental serial number)
There is the auto-incrementing ID from Neo4j which starts at 0 independently for nodes and relationships. It can (I think) be depended on for referring to nodes in the short-term (i.e. seconds), but not in the long term as they may get cleaned up and moved around by Neo4j for performance.
If you're thinking about putting IDs on relationships what you're doing may not be the right modeling approach for Neo4j (though I couldn't say for sure without details). Relationships themselves can't be queried directly, but rather can only be accessed via first finding nodes. I think it would make sense to have an incrementing ID which is unique for all relationships relative to a node, but not globally. This is also why Neo4j.rb doesn't generate UUIDs for relationships. You may want to consider representing the relationships as intermediate nodes.
If you want to implement an incrementing ID on an ActiveNode model, before_save should be a fine way to do it.

Do we need to index on relationship properties to ensure that Neo4j will not search through all relationships

To clarify, let's assume that I have a relationship type: "connection." Connections has a property called: "typeOfConnection," which can take on values in the domain:
{"GroupConnection", "FriendConnection", "BlahConnect"}.
When I query, I may want to qualify connection with one of these types. While there are not many types, there will be millions of connections with each property type.
Do I need to put an index on connection.typeOfConnection in order to ensure that all connections will not be traversed?
If so, I have been unable to find a simple cypher statement to do this. I've seen some stuff in the documentation describing how to do this in Java, but I'm interacting with Neo using Py2Neo, so it would be wonderful if there was a cypher way to do this.
This is a mixed granularity property graph data model. Totally fine, but you need to replace your relationship qualifiers with intermediate nodes. To do this, replace your relationships with one type node and 2 relationships so that you can perform indexing.
Your model has a graph with a coarse-grained granularity. The opposite extreme is referred to as fine-grained granularity, which is the foundation of the RDF model. With property graph you'll need to use nodes in place of relationships that have labels applied by their type if you're going to do this kind of coarse-grained graph.
For instance, let's assume you have:
MATCH (thing1:Thing { id: 1 })-->(:Connection { type: "group" }),
(group)-->(thing2:Thing)
RETURN thing2
Then you can index on the label Connection by property type.
CREATE INDEX ON :Connection(type)
This allows you the flexibility of not typing your relationships if your application requires dynamic types of connections that prevent you from using a fine-grained granularity.
Whatever you do, don't work around your issue by dynamically generating typed relationships in your Cypher queries. This will prevent your query templates from being cached and decrease performance. Either type all your relationships or go with the intermediate node I've recommended above.

Neo4j, Which is better: multiple relationships or one with a property?

I'm new to neo4j, and I'm building a social network. For the sake of this question, my graph consists of user and event nodes with relationship(s) between them.
A user may be invited, join, attend or host an event, and each is a subset of the one before it.
Is there any benefit to / should I create multiple relationships for each status/state, or one relationship with a property to store the current state?
Graph-type queries are more easily/efficiently done on relationship types than properties, from what I understand.
How about one relationship, but a different relationship type?
You can query on several types of relationships with pipes using Cypher (in case you have other relationships to the event that you don't want to pick up in queries).
Update--adding console example: http://console.neo4j.org/?id=woe684
Alternatively, you can just leave the old relationships there and not have to build the slightly more complicated queries, but that feels a bit wasteful for this use case.
When possible, choosing different relationship types over a single type qualified by properties can have a significant positive performance impact when querying the graph. The former approach is aways at least 2x faster than the latter. When data is in high-level cache and the graph is queried using native Java API, the first approach is more than 8x faster for single-hop traversals.
Source: http://graphaware.com/neo4j/2013/10/24/neo4j-qualifying-relationships.html

Resources