Is this syntax not right for executing an APOC query? - neo4j

call apoc.index.nodes('Product', 'name:iPhone*') yield node return node
In my graph I have 'iPhone X' and 'iPhone Plus', but this query doesn't return anything. I also have an index on 'name' property of Product.
Indexes
ON :Product(name) ONLINE

apoc.index.nodes is one of the APOC procedures for "manual indexes", which are also confusingly referred to in various docs as "legacy indexes" and "explicit indexes". Such indexes use the Apache Lucene library and are NOT the same as the standard neo4j indexes that most people use, and the way you create/update/use such indexes is also not standard.
For example, you cannot create a "manual index" via a Cypher CREATE INDEX clause. And neo4j Browser's :schema command will not show any manual indexes.
If you will only be searching :Product(name) via manual indexes, then you should drop your standard index for :Product(name), since it will not be needed but will add overhead (time and space) to your DB.
One way to create/update/use manual indexes is through the special APOC procedures. The APOC documentation for manual indexes (linked above) provides a good amount of information about how to add nodes and relationships to such indexes, and how to search using them.
As an example, before you can use the query in your question, you first have to add all the :Product(name) values to the Product manual index. If you want to add them all at once, you can use the following query (and since it has to return something, it just returns a count of the number of Products):
MATCH (p:Product)
CALL apoc.index.addNode(p, ['name'])
RETURN count(*)
[UPDATED]
Manual indexing is typically only used for partial and fuzzy text search use cases. When you just need exact value matching, standard indexes are recommended, especially since they require much less effort on your part. The reason manual indexes are called "manual" is because the responsibility for maintaining them falls entirely on your shoulders. That is, your node/relationship/property addition/removal/update queries would normally have to add/remove/update any relevant manual index entries as well. Note that when you update a property that is manually indexed, you have to remove the old index entry and then add the new entry.

Related

Indices in Neo4j - questions and doubts

The only indices that I know about them are indices on properties (these indices are created on particular labels (node types)). I have some doubts, however.
Are there exists indices on edges/relationships?
I often read that Neo4j leveraged Lucene Index. Is it still used? What is aim?
Are there exists any other indicses than indices on properties?
Thanks in advance,
Neo4j has two indexing systems.
The more modern one is referred to as "schema indexes", and these are the ones that are automatic and apply to properties of a given label for quick lookup by those properties when the given properties and label are provided within a query. This does not currently support indexing of relationship properties. These started out based on lucene, but we've gradually replaced the implementation with our own native indexing solution. Discussion of these, as well as any noteworthy information and limitations, can be found in our index configuration documentation.
The other indexing system is an older manual system that is called "explicit indexes", though this has previously been called "manual indexes". This is also based on lucene, but these are not automatic -- it is up to the user to manually add or remove entries to the index and keep them up to date when data in the database changes. This makes usage and maintenance cumbersome, and we recommend avoid using this system if possible.
Built-in procedures are the means to create and lookup using explicit indexes, as these are never used automatically under the hood (as opposed to schema indexes). APOC Procedures also offers various means of interfacing with explicit indexes.
The main reason one would use explicit indexes is because you are able to create an index on relationships for properties and get fast lookup when querying the index. This also allows for a full text lookup across multiple labels and properties, provided the index has been configured in such a way.
Separate from all of these, it should be noted that usage of labels is itself a kind of index, as it provides quick access to all nodes with the given label.

Embedded automatic full text indexing completely removed from Neo4j as of 3.0.0?

I'm moving from Neo4j 2.2.* to (still prerelease) 3.0.0 and all of a sudden it seems that configuration parameters
node_auto_indexing=true
relationship_auto_indexing=true
node_keys_indexable=some_node_property
relationship_keys_indexable=some_rel_property
had gone and are not available any more. This is sad because I need full-text indexing (namely, fuzzy search queries and range searches), I was happily using it since 2.0.0 and had a naive hope that new Lucene 5.5 will make my life better with 3.0.0.
Is this functionality completely removed? START clause is still here in Cypher, neo4j-shell still has command which allows manipulating "legacy" FT indices so my question is:
how do I populate my FT index without using Java or another external programming language?
case 1: I import some bunch of "static" data into the graph which
will rarely be updated (consider dictionary) and need to arrange FTS
on those once, and manually perform complete reindex on occasional updates of the dataset;
case 2: nodes and relationships with specific properties
automagically get indexed upon creation or upon assignment of a new value to the property with specific name, near-realtime, as it used to be before.
New schema indexes are cool in 3.0.0 and range searches are implemented, but a) they work only on properties of nodes, no relationships, b) they don't allow full-text, fuzzy queries, and AFAIK regular expression matching does not use index.
Thanks for your suggestions!
WBR, Andrii
Andrii,
only the default config parameters have been removed not the functionality.
What is the actual use-case you are using the FTS indexes (on rels) for?
In 3.0 you can still use the start-clause but using stored procedures you can add nodes and relationship explicitly to indexes. And you can use similar procedures to query your indexes even more efficiently, e.g. by passing in start and end-nodes.
See (WIP): https://github.com/jexp/neo4j-apoc-procedures#manual-indexes

How do a general search across string properties in my nodes?

Working with Neo4j in a Rails app.
I have nodes with several string properties containing long strings of user generated content. For example in my nodes of type: "Book", I might have properties, "review", and "summary", which would contain long-form string values.
I was trying to design queries that returned nodes which match those properties to general language search terms provided by a user in a search box. As my query got increasingly complicated, it occurred to me that I was trying to resolve natural language search.
I looked into some of the popular search gems in Rails, but they all seem to depend on ActiveRecord. What search solutions exist for Neo4j.rb?
There are a few ways that you could go about this!
As FrobberOfBits said, Neo4j has what are called "legacy indexes" which use Lucene it the background to provide indexing of generic things. It does support the new schema indexes. Unfortunately those are based on exact matches (though I'm pretty sure that will change in Neo4j 2.3.x somewhat).
Neo4j does support pattern matching on strings via the =~ operator, but those queries aren't indexed. So the performance depends on the size of your database.
We often recommend a gem called searchkick which lets you define indexes for Elasticsearch in your models. Then you can just call a Model.search method to do your searches and it will first query elasticsearch to get the node IDs and then load those nodes via Neo4j.rb. You can use that via the neo4j-searchkick gem: https://github.com/neo4jrb/neo4j-searchkick
Lastly, if you're doing NLP and are trying to extract important words from your text, you could create a Tag/Word label and create relationships from your nodes to these NLP extracted nodes so that you can search based on those nodes in the future. You could even build recommendations from one text node to another based on the number/type of common tag nodes.
I don't know if anything specific exists for neo4j.rb and activerecord. What I can say is that generally this stuff is handled through the use of legacy indexes that are implemented by Lucene.
The premise is that you create a lucene-managed index on certain properties, and that then gives you access to use the Lucene query language via cypher to get data from those indices. Relative to neo4j.rb, it doesn't look any different than running cypher queries, like this:
START item=node:node_auto_index("(title:'foo bar' AND body:baz*) OR title:'bat'")
RETURN item
Note that lucene indexes and that query language can only be used in a START block, not a MATCH block. Refer to the Lucene Query Syntax to discover more about what you can do with that query syntax (fuzzy matching, wildcards, etc -- quite a bit more extensive than what regex would give you).

Should we update indexes after node update in neo4jphp?

According to this manual https://github.com/jadell/neo4jphp/wiki/Indexes we should worry about adding and removing nodes to indexes by ourselves.
OK, I'm adding nodes to indexes after creating them. But should I also update the indexes when I change some of the node's properties?
Neo4j has two indexing systems: The Legacy Indexes and Indexes.
Legacy indexes
This is a stand-alone indexing service that Neo4j ships with, and it gives you very little for free, it does not keep up to date with changes you make to the graph, other than lazilly removing items that you've deleted in the graph.
If you want something in a legacy index, you must manually put it in there, and if you want it to reflect a change in the graph, you must manually update the index.
The sole reason these indexes remain, other than for backwards compatibility, is that they support complex indexes like geo-spatial indexing and rich full text indexing functionality. These are not yet supported by the new Indexes.
Read more about legacy indexes here: http://docs.neo4j.org/chunked/stable/indexing.html
Indexes
These were added in 2.0.0, and work the same way indexes do in relational databases - they are an optimization that you can introduce, and they are automatically kept in sync with the "primary" data, in our case, with changes with the graph.
An Index is defined on a combination of a Label and a Property Key, and subsequent lookups on that Label/Property key combination will (if the query planner determines this is the most efficient thing to do) use that index.
Read more about indexes here: http://docs.neo4j.org/chunked/stable/graphdb-neo4j-schema.html
If you are using legacy indexes (described by #jakewins), unless you have auto-indexing turned on for the fields being indexed, yes, you must manually remove and re-add the nodes when the property values change.

Neo4j 2.0: Indexing array-valued properties with schema indexing

I have nodes with multiple "sourceIds" in one array-valued property called "sourceIds", just because there could be multiple resources a node could be derived from (I'm assembling multiple databases into one Neo4j model).
I want to be able to look up nodes by any of their source IDs. With legacy indexing this was no problem, I would just add a node to the index associated with each element of the sourceIds property array.
Now I wanted to switch to indexing with labels and I'm wondering how that kind of index works here. I can do
CREATE INDEX ON :<label>(sourceIds)
but what does that actually do? I hoped it would just create index entries for each array element, but that doesn't seem to be the case. With
MATCH n:<label> WHERE "testid" in n.sourceIds RETURN n
the query takes between 300ms and 500ms which is too long for an index lookup (other schema indexes work three to five times faster). With
MATCH n:<label> WHERE n.sourceIds="testid" RETURN n
I don't get a result. That's clear because it's an array property but I just gave it a try since it would make sense if array properties would be broken down to their elements for indexing purposes.
So, is there a way to handle array properties with schema indexing or are there plans or will I just have to stick to legacy indexing here? My problem with the legacy Lucene index was that I hit the max number of boolean clauses (1024). Another question thus would be: Can I raise this number? Lucene allows that, but can I do this with the Lucene index used by Neo4j?
Thanks and best regards!
Edit: A bit more elaboration on why I hit the boolean clauses max limit: I need to export specific parts of the database into custom file formats for text processing pipelines. These pipelines use components I cannot (be it for the sake of accessibility or time) change to query Neo4j directly, so I'd rather stay with the defined required file format(s). I do the export via the pattern "give me all IDs in the DB; now, for batches of IDs, query the desired information (e.g. specific paths) from Neo4j and store the results to file". Why I use batches at all? Well, if I don't, things are slowed down significantly via the connection overhead. Thus, large batches are a kind of optimization here.
Schema indexes can only do exact matches right now. Your "testid" in n.sourceIds does not use the index (as shown by your query times). I think there are plans to make this behave better, but I'm waiting for them just as eagerly as you are.
I've actually hit a lower max in the lucene query: 512. If there is a way to increase it I'd love to hear of it. The way I got around it is just doing more than one query if I have one of the rare cases that actually goes over 512 ids. What query are you doing where you need more?

Resources