Is it possible in Neo4j to create an index on relationship property? Right now I faced a very poor performance over comparison/filtering operations on relationship property value. This is the example of my issue Neo4j Cypher count query performance optimizaztion
In neo4j 3.3.x, there are now built-in procedures for explicit indexes, which include the ability to create "explicit" indexes for relationships.
"Explicit" indexes are not the same as the normal "schema" indexes that you are already aware of (which are automatically maintained for you once you create an index or uniqueness constraint). They are called "explicit" because you have to write code to add nodes or relationships to such indexes, and you also have to write code to get nodes or relationships from such indexes. But, it might be worth the effort in some cases.
Related
I have a idea of indexing in rdbms but can't think how indexing works in neo4j and also what is schema indexing?
To quote from neo4j's free book, Graph Databases:
Indexes help optimize the process of finding specific nodes.
Most of
the time, when querying a graph, we’re happy to let the traversal
process discover the nodes and relationships that meet our
information goals. By following relationships that match a specific
graph pattern, we encounter elements that contribute to a query’s
result. However, there are certain situations that require us to pick
out specific nodes directly, rather than discover them over the course
of a traversal. Identifying the starting nodes for a traversal, for
example, requires us to find one or more specific nodes based on some
combination of labels and property values.
That same book does an extensive comparison between neo4j and relational databases as well.
As for what the above-mentioned indexes (also known as "schema indexes") index: they index the nodes that have a specific node label and node property combination.
There is also a different indexing mechanism called "manual" (or "legacy", or "explicit") indexing, which is now only recommended for special use cases.
[UPDATE]
As an example, suppose we have already created an index on :Person(firstname), like so:
CREATE INDEX ON :Person(firstname);
In that case, the following query can quickly start off by using the index to find the desired Person nodes. Once those nodes are found, neo4j can easily traverse their outgoing WORKS_AT relationships to find the related Company nodes:
MATCH (p:Person)-[:WORKS_AT]->(c:Company)
WHERE p.firstname = 'Karan'
RETURN p, c;
Without that index, the query would have to either:
Scan through all Person nodes to find the right ones, before traversing their outgoing WORKS_AT relationships, or
Find all Company nodes, traverse their incoming WORKS_AT relationships, and compare the firstname values of every Person at the other end of the relationship.
call apoc.index.nodes('Product', 'name:iPhone*') yield node return node
In my graph I have 'iPhone X' and 'iPhone Plus', but this query doesn't return anything. I also have an index on 'name' property of Product.
Indexes
ON :Product(name) ONLINE
apoc.index.nodes is one of the APOC procedures for "manual indexes", which are also confusingly referred to in various docs as "legacy indexes" and "explicit indexes". Such indexes use the Apache Lucene library and are NOT the same as the standard neo4j indexes that most people use, and the way you create/update/use such indexes is also not standard.
For example, you cannot create a "manual index" via a Cypher CREATE INDEX clause. And neo4j Browser's :schema command will not show any manual indexes.
If you will only be searching :Product(name) via manual indexes, then you should drop your standard index for :Product(name), since it will not be needed but will add overhead (time and space) to your DB.
One way to create/update/use manual indexes is through the special APOC procedures. The APOC documentation for manual indexes (linked above) provides a good amount of information about how to add nodes and relationships to such indexes, and how to search using them.
As an example, before you can use the query in your question, you first have to add all the :Product(name) values to the Product manual index. If you want to add them all at once, you can use the following query (and since it has to return something, it just returns a count of the number of Products):
MATCH (p:Product)
CALL apoc.index.addNode(p, ['name'])
RETURN count(*)
[UPDATED]
Manual indexing is typically only used for partial and fuzzy text search use cases. When you just need exact value matching, standard indexes are recommended, especially since they require much less effort on your part. The reason manual indexes are called "manual" is because the responsibility for maintaining them falls entirely on your shoulders. That is, your node/relationship/property addition/removal/update queries would normally have to add/remove/update any relevant manual index entries as well. Note that when you update a property that is manually indexed, you have to remove the old index entry and then add the new entry.
What is faster/better way to model, searching for a node with an indexed property, or having a single ROOT node with lots of ChildOf relationships, each with a relationship property equal to the index property and starting the search from ROOT and traversing the relationships that have the correct relationship property? Assume the key being sought is unique.
My understanding is that the current version of Neo4j (2.2.3) uses the built-in indexing features of Neo4j (as of version 2.x) when you declare an index on the label.property combination you wish to use in a predicate. With relationship properties, the indexing does not use the newer indexing scheme. You can only use the old legacy indexing for relationship properties, which is not as fast.
See the note on this page.
I think this is the wrong way to think about this question; you should model the data in the way that's more natural for the domain.
It's hard to answer which will be faster because you haven't specified things like how many valid values the index would have in it, the total number of nodes, and so on. In any case, if you're trying to express some kind of semantic relationship like ChildOf you're almost certainly better off with the node and relationships. You should consider storing the ID of one node as a property value of another node to be a major anti-pattern to be avoided.
If on the other hand, the property is say, gender of a person, M/F, and you have 1,000,000 people, then you end up with two "index nodes", each with 500,000 relationships, that's not going to be a good idea.
In general, neo4j is set up to traverse relationships fast, so in general you'll be better off exploiting relationships. But there are a lot of exceptions to that which depend on your domain's semantics, and the cardinality of your attribute values, so YMMV.
To clarify, let's assume that I have a relationship type: "connection." Connections has a property called: "typeOfConnection," which can take on values in the domain:
{"GroupConnection", "FriendConnection", "BlahConnect"}.
When I query, I may want to qualify connection with one of these types. While there are not many types, there will be millions of connections with each property type.
Do I need to put an index on connection.typeOfConnection in order to ensure that all connections will not be traversed?
If so, I have been unable to find a simple cypher statement to do this. I've seen some stuff in the documentation describing how to do this in Java, but I'm interacting with Neo using Py2Neo, so it would be wonderful if there was a cypher way to do this.
This is a mixed granularity property graph data model. Totally fine, but you need to replace your relationship qualifiers with intermediate nodes. To do this, replace your relationships with one type node and 2 relationships so that you can perform indexing.
Your model has a graph with a coarse-grained granularity. The opposite extreme is referred to as fine-grained granularity, which is the foundation of the RDF model. With property graph you'll need to use nodes in place of relationships that have labels applied by their type if you're going to do this kind of coarse-grained graph.
For instance, let's assume you have:
MATCH (thing1:Thing { id: 1 })-->(:Connection { type: "group" }),
(group)-->(thing2:Thing)
RETURN thing2
Then you can index on the label Connection by property type.
CREATE INDEX ON :Connection(type)
This allows you the flexibility of not typing your relationships if your application requires dynamic types of connections that prevent you from using a fine-grained granularity.
Whatever you do, don't work around your issue by dynamically generating typed relationships in your Cypher queries. This will prevent your query templates from being cached and decrease performance. Either type all your relationships or go with the intermediate node I've recommended above.
According to this manual https://github.com/jadell/neo4jphp/wiki/Indexes we should worry about adding and removing nodes to indexes by ourselves.
OK, I'm adding nodes to indexes after creating them. But should I also update the indexes when I change some of the node's properties?
Neo4j has two indexing systems: The Legacy Indexes and Indexes.
Legacy indexes
This is a stand-alone indexing service that Neo4j ships with, and it gives you very little for free, it does not keep up to date with changes you make to the graph, other than lazilly removing items that you've deleted in the graph.
If you want something in a legacy index, you must manually put it in there, and if you want it to reflect a change in the graph, you must manually update the index.
The sole reason these indexes remain, other than for backwards compatibility, is that they support complex indexes like geo-spatial indexing and rich full text indexing functionality. These are not yet supported by the new Indexes.
Read more about legacy indexes here: http://docs.neo4j.org/chunked/stable/indexing.html
Indexes
These were added in 2.0.0, and work the same way indexes do in relational databases - they are an optimization that you can introduce, and they are automatically kept in sync with the "primary" data, in our case, with changes with the graph.
An Index is defined on a combination of a Label and a Property Key, and subsequent lookups on that Label/Property key combination will (if the query planner determines this is the most efficient thing to do) use that index.
Read more about indexes here: http://docs.neo4j.org/chunked/stable/graphdb-neo4j-schema.html
If you are using legacy indexes (described by #jakewins), unless you have auto-indexing turned on for the fields being indexed, yes, you must manually remove and re-add the nodes when the property values change.