I am trying to optimize requests in Gremlin on a Neo4J graph.
Here is the short version of the basic request I am using:
g.idx("myIndex")[[myId:5]].outE("HAS_PRODUCT").filter{it.shop_id:5}.inV
So I looked into indexation and got to create an index on "HAS_PRODUCT"-typed edges with key 'shop_id'.
Using the same request, I don't see a big difference.
My questions are:
Is my new index used when I query with: filter{it.shop_id:5}
If not, how can I use this new index in my request?
More generally, if idx( is the graph method to use an index, is there a pipe method for that?
Thanks!
The short answer is that Gremlin won't make use of the secondary index when using Neo4j, but please consider reading the longer answer below in relation to TinkerPop, Gremlin and its philosophy.
The longer answer is....Indices are not being used for your shop_id. When you call outE you are effectively iterating all the edges to find those with shop_id == 5. To make use of indices in Gremlin you should use a vertex query. So, rewriting your code a bit (to also use key indices) would be like:
g.V('myIndex',5).outE('HAS_PRODUCT').has('shop_id',5).inV
With Blueprints implementations that support vertex-centric indices, the use of has will utilize that index automatically. Unfortunately, Neo4j is not one of those databases yet. Blueprints implementations that do implement it, include Titan (see Vertex-centric indices) and OrientDB (as part of the yet unreleased Blueprints 2.4.0...I believe they will have a partial implementation in that release) in that case.
Related
The only indices that I know about them are indices on properties (these indices are created on particular labels (node types)). I have some doubts, however.
Are there exists indices on edges/relationships?
I often read that Neo4j leveraged Lucene Index. Is it still used? What is aim?
Are there exists any other indicses than indices on properties?
Thanks in advance,
Neo4j has two indexing systems.
The more modern one is referred to as "schema indexes", and these are the ones that are automatic and apply to properties of a given label for quick lookup by those properties when the given properties and label are provided within a query. This does not currently support indexing of relationship properties. These started out based on lucene, but we've gradually replaced the implementation with our own native indexing solution. Discussion of these, as well as any noteworthy information and limitations, can be found in our index configuration documentation.
The other indexing system is an older manual system that is called "explicit indexes", though this has previously been called "manual indexes". This is also based on lucene, but these are not automatic -- it is up to the user to manually add or remove entries to the index and keep them up to date when data in the database changes. This makes usage and maintenance cumbersome, and we recommend avoid using this system if possible.
Built-in procedures are the means to create and lookup using explicit indexes, as these are never used automatically under the hood (as opposed to schema indexes). APOC Procedures also offers various means of interfacing with explicit indexes.
The main reason one would use explicit indexes is because you are able to create an index on relationships for properties and get fast lookup when querying the index. This also allows for a full text lookup across multiple labels and properties, provided the index has been configured in such a way.
Separate from all of these, it should be noted that usage of labels is itself a kind of index, as it provides quick access to all nodes with the given label.
I'm moving from Neo4j 2.2.* to (still prerelease) 3.0.0 and all of a sudden it seems that configuration parameters
node_auto_indexing=true
relationship_auto_indexing=true
node_keys_indexable=some_node_property
relationship_keys_indexable=some_rel_property
had gone and are not available any more. This is sad because I need full-text indexing (namely, fuzzy search queries and range searches), I was happily using it since 2.0.0 and had a naive hope that new Lucene 5.5 will make my life better with 3.0.0.
Is this functionality completely removed? START clause is still here in Cypher, neo4j-shell still has command which allows manipulating "legacy" FT indices so my question is:
how do I populate my FT index without using Java or another external programming language?
case 1: I import some bunch of "static" data into the graph which
will rarely be updated (consider dictionary) and need to arrange FTS
on those once, and manually perform complete reindex on occasional updates of the dataset;
case 2: nodes and relationships with specific properties
automagically get indexed upon creation or upon assignment of a new value to the property with specific name, near-realtime, as it used to be before.
New schema indexes are cool in 3.0.0 and range searches are implemented, but a) they work only on properties of nodes, no relationships, b) they don't allow full-text, fuzzy queries, and AFAIK regular expression matching does not use index.
Thanks for your suggestions!
WBR, Andrii
Andrii,
only the default config parameters have been removed not the functionality.
What is the actual use-case you are using the FTS indexes (on rels) for?
In 3.0 you can still use the start-clause but using stored procedures you can add nodes and relationship explicitly to indexes. And you can use similar procedures to query your indexes even more efficiently, e.g. by passing in start and end-nodes.
See (WIP): https://github.com/jexp/neo4j-apoc-procedures#manual-indexes
I am writing a sever plugin for Neo4J. The plugin receives a cypher query, and executes it. Currently, my implementation uses a CypherExecutor.
I now need to further constrain the results. (For example, imagine that the results need to be filtered by ACLs.)
One approach is to filter the results after executing the query. I'd rather not do this, for performance reasons as well as other limitations (for example, any aggregate results would be wrong.)
I considered adding the constraints to the query itself. I've looked at the command.AbstractQuery subclasses produced using the CypherParser. That object model is immutable.
I am wondering whether I will need to resort to cloning Neo4J's ExecutionEngine and CypherCompiler, just to extend the ExecutionPlanBuilder... I would like to avoid this option if at all possible.
Any recommendations about how this can be done?
In my case, I am simply trying to simulate multiple isolated graphs. I am OK with how this might be modeled -- whether I add a 'tenantId' to each node, or maintain a tenant node and add (:Tenant)<-[:scopedTo]-(n) relationships to every node.
First of all, I bet that there is an answer on this question somewhere in docs, but since 'Manual: Labels and Indexes' link here gives me 404 error, I'm going to ask you anyway.
Is it possible to create an index on some label and specify it as an automatic one (just like legacy indexes I'm currently using, but for labels)?
If someone from neo4j team is reading this post, please let me know if I'm looking for the documentation in the right place, 'cause I can't find anything more or less informative on labels and indexes (except a couple of posts in Michael Hunger's blog and, maybe, some presentations, what is obviously not enough).
This is a more technical one: is it possible to find an item in the index by the regex? Suppose I have node with property 'n' -> '/a/b/c', and another node 'n' -> '/a/*/c. Can I somehow match them?
I don't work for Neo4j but I'll answer anyway.
All label indexing is automatic. Once you've created the index it maintains itself, possibly with minimal delay.
The manual for the last stable release can always be found here. The chapter on indexing for the embedded Java API is here.
You cannot use regexp with label indices yet. It's said to be on the agenda, along with index support for array lookups, i.e. what in Cypher would be
MATCH (a:MyLabel) WHERE a.value IN ['val1', 'val2']
I have been trying to come across a query builder for Neo4j's query language Cypher, ideally using a fluent API. I did not find much, and decided to invest some time on building one myself.
The result so far is a fluent API query builder for the Cypher 1.9 spec.
I wanted to use StackOverflow to kick off a discussion and see what the thoughts are, before I release the code.
Here is a demo query that you would want to send off to Neo4j using Cypher.
Show me all people who John knows who know software engineers at Google (Google company code assumed to be 12345).
The relationship strength between John and the people who connect him to Google employees should be at least 3 (assuming a range from 1-5).
Return all of John's connections and the people they know at Google, including the relationships between those people.
Sort the results by name of John's connections in ascending order and then by relationship strength in descending order.
Using Fluent-Cypher:
Cypher
.on(Node.named("john").with(Index.named("PERSON_NAMES").match(Key.named("name").is("John"))))
.on(Node.named("google").with(Id.is(12345)))
.match(Connection.named("rel1").andType("KNOWS").between("john").and("middle"))
.match(Connection.named("rel2").andType("KNOWS").between("middle").and("googleEmployee"))
.match(Connection.withType("WORKS_AT").from("googleEmployee").to("google"))
.where(Are.allOfTheseTrue(Column.named("rel1.STRENGTH").isGreaterThanOrEqualTo(3)
.and(Column.named("googleEmployee.TITLE").isEqualTo("Software Engineer"))))
.returns(Columns.named("rel1", "middle", "rel2", "googleEmployee"))
.orderBy(Asc.column("middle.NAME"), Desc.column("rel1.STRENGTH"))
which yields the following query:
START john=node:PERSON_NAMES(name='John'),google=node(12345) MATCH john-[rel1:KNOWS]-middle,middle-[rel2:KNOWS]-googleEmployee,googleEmployee-[:WORKS_AT]->google WHERE ((rel1.STRENGTH >= '3' AND googleEmployee.TITLE = 'Software Engineer')) RETURN rel1,middle,rel2,googleEmployee ORDER BY middle.NAME ASC,rel1.STRENGTH DESC
I agree that you should build this with an eye towards Cypher 2.0. As of 2.0, it's very important that WHERE clauses are matched up with the correct START, (OPTIONAL) MATCH, and WITH clauses making the design of a fluent API a bit more challenging.
I like your first example where you just use the text to describe the query. The second option, to tell you the truth, doesn't look so much easier to me than constructing the Cypher query itself. The language is quite easy to use and is well documented. Adding another layer of abstraction would only increase complexity. However, if you find a way of translating this natural language request into a Cypher request, that'd be cool :)
Also, why not start working directly with Cypher 2.0?
Finally, check out this here: http://github.com/noduslabs/infranodus – I'm working on a similar problem but for adding the nodes into the database, not querying them. I chose to use #hashtags to make it easier for people to understand how their queries should be structured (as we already use them). So in your case it could become something like
#show-all #people who #John :knows who :know #software-engineers :at #Google.
#relationship-strength between #John and the #people who are #linked to #Google #software-engineers should be at least #3
#return #all of #John's #connections and the #people they :know at #Google, including the #relationships-between those #people.
#sort the #results #by-name of #John's #connections in #ascending order and then by #relationship-strength in #descending order.
(let's say the #hashtags refer to nodes, the #at refers to actions on them)
If you could pull something like this off, I think that'd be a much better and more useful simplification of the already easy-to-use Cypher.