Neo4j Embedded - Auto Index Multiple Properties - neo4j

I turned on node auto-indexing and it's indexing the properties I need. If I start up the Neo4j server and open the webadmin, I see that there is an index called node_auto_index as per this post. It works perfectly from the webadmin and I can run Cypher queries like this:
START n=node:node_auto_index('__type:user AND __username:admin') RETURN n
The query returns exactly what I expect. However, if I shut down the server and open the DB in embedded mode from a Scala application, this doesn't work. If I try to run the same Cypher query, I get an error that node_auto_index doesn't exist. I checked the GraphDatabaseService properties, and auto indexing is up and running on the right keys, but when getting a list of all of the index names, the list is always empty. And I can't use the AutoIndex API because it only indexes on one property, and I definitely need both.
So from this point, what would be the best way to ago about querying the auto-index with multiple properties from my Scala (Java) code?
EDIT: I noticed that the ReadableIndex interface (which is what the auto-index is) can take a query string. I can't find much documentation on it, so I'm going to try a few things, but is there any chance that could take a Cypher query? Or just the single-quoted string in my query above?

Turns out that the query function of the ReadableIndex actually takes a Lucene Query, which I now realize is what I had quoted above. So calling this code:
val nodes = db.index.getNodeAutoIndexer.getAutoIndex.query("__type:user AND __username:admin")
Gave me exactly what I wanted.

Related

Embedded automatic full text indexing completely removed from Neo4j as of 3.0.0?

I'm moving from Neo4j 2.2.* to (still prerelease) 3.0.0 and all of a sudden it seems that configuration parameters
node_auto_indexing=true
relationship_auto_indexing=true
node_keys_indexable=some_node_property
relationship_keys_indexable=some_rel_property
had gone and are not available any more. This is sad because I need full-text indexing (namely, fuzzy search queries and range searches), I was happily using it since 2.0.0 and had a naive hope that new Lucene 5.5 will make my life better with 3.0.0.
Is this functionality completely removed? START clause is still here in Cypher, neo4j-shell still has command which allows manipulating "legacy" FT indices so my question is:
how do I populate my FT index without using Java or another external programming language?
case 1: I import some bunch of "static" data into the graph which
will rarely be updated (consider dictionary) and need to arrange FTS
on those once, and manually perform complete reindex on occasional updates of the dataset;
case 2: nodes and relationships with specific properties
automagically get indexed upon creation or upon assignment of a new value to the property with specific name, near-realtime, as it used to be before.
New schema indexes are cool in 3.0.0 and range searches are implemented, but a) they work only on properties of nodes, no relationships, b) they don't allow full-text, fuzzy queries, and AFAIK regular expression matching does not use index.
Thanks for your suggestions!
WBR, Andrii
Andrii,
only the default config parameters have been removed not the functionality.
What is the actual use-case you are using the FTS indexes (on rels) for?
In 3.0 you can still use the start-clause but using stored procedures you can add nodes and relationship explicitly to indexes. And you can use similar procedures to query your indexes even more efficiently, e.g. by passing in start and end-nodes.
See (WIP): https://github.com/jexp/neo4j-apoc-procedures#manual-indexes

Weird results with cypher query in Neo4j and node_auto_index

I have a graph database (Neo4j) in which I configured a property to be auto indexed with full-text. Everything is working great except that I have 1 row that is not returned when I execute a particular cypher query.
My property in the graph equals (I've put in bold the words I am using in my cypher query):
1pizzeriadeicomparipourlesamateursdevraiespizzasitaliennescestadireavecpastropdepateetcuitesaufeudeboislaplacenepayepasdeminesalleettablesassezpetitesetilfautsarmerdepatiencelessamedisoirssionnapasreserveenv15minutesdattentemaislespizzassontexcellentesrestaurantmontrealmontrealquebeccanada5148435411
If I execute the following cypher query:
START n1=NODE:node_auto_index('Search_Field:*res* AND Search_Field:*taurant* AND Search_Field:*411*')
RETURN n1.Search_Field
My row is returned!
So far no problem!
But when I execute it by putting the word « restaurant » all together like this:
START n1=NODE:node_auto_index('Search_Field:*restaurant* AND Search_Field:*411*')
RETURN n1.Search_Field
Then no rows are returned.
I tested a lot of stuffs in order to understand and try to find a pattern or something that can explain the problem. It seems like the length of my property value might play a role. I know it sounds strange but if I add 3 or more letters, let say « aaa », after the word restaurant in the property value, like this (look at the bold letters close to the end of the value):
1pizzeriadeicomparipourlesamateursdevraiespizzasitaliennescestadireavecpastropdepateetcuitesaufeudeboislaplacenepayepasdeminesalleettablesassezpetitesetilfautsarmerdepatiencelessamedisoirssionnapasreserveenv15minutesdattentemaislespizzassontexcellentesrestaurantaaamontrealmontrealquebeccanada5148435411
then, if I execute the same cypher query, the row is now returned.
Anyone had encountered similar problems! It's driving me crazy!
I have tested on both Neo4j-enterprise 2.2.1 and the latest Community 3.0.0-M02. Same result with both of them.
Any idea on where or what should I look for ?
The query term get passed through the lucene analyzer - just like the contents you index. I'm not 100% sure but I think that the default analyzer "eats up" the digits, that's why you don't get the results.
You can supply an analyzer class when the index is created for the first time. Also you can use Java API to query the index - this allows to pass in instances of Lucene Query, see my example at http://blog.armbruster-it.de/2014/10/deep-dive-on-fulltext-indexing-with-neo4j/.

How do I find a string in an unknown neo4j database using Cypher?

TL,DR: I need a query which gives me all nodes/relationships which contain a certain value (in my case a string, so much I know), without knowing which property(key) contains the string. I am using neo4j(latest version), meteor (latest version) and the meteor neo4j driver, which is recommended on the neo4j website for meteor.
Currently I am working (as part of my bachelor thesis) on a tool to visualize the output of any Cypher query on any database, regardless of the database contents.
So far I've managed to correctly display nodes/relationships which are coming out. My problem now is to visualize (get nodes and relationships to feed into my frontend) textual queries like (taken from the neo4j movie database, which I am using for development)
MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors)
RETURN coActors.name
This kind of queries only returns an array of strings and not whole nodes or relationships. I now need some way (preferably a Cypher query) to get all nodes which contain for example the string "Audrey Tatou".
The problem I've now run into is that I didn't find a way to write a query which doesn't need something like
MATCH n
WHERE Person.name = "some name"
Since I don't know anything about the contents of the database I cannot use
WHERE propertyName = "propertyValue"
since I only know the value but not the name of the property.
The only solution here will be to get every nodes with your label and check properties and values using reflection on client side.
Using cypher, the solution would be to get all properties and their values and parse their values using a foreach loop. Maybe you can do this, but I'm really not sure, it's a recent feature but you can still give a try.
Here is what I found for the cypher solution: How can I return all properties for a node using Cypher?
So, you have query that returns array of string.
In fact - you can receive almost anything as result. Cypher is capable to return just bare strings, that are not related to anything.
Long story short - you can't vizualize this data, because of this data nature. Best you can do is to represent them as table (or similar), like Neo4j browser do this.
But, there is (probably) solution for you. Neo4j has feature called Legacy indexing. And there you can find full text indexes. Maybe this can help you.
You can just use a driver that returns nodes and rels, or if you do the queries manually add resultDataContents entry
{statements:[{statement:"MATCH ..","resultDataContents",["graph"]}]}
to your payload and you get nodes and relationships back.

Neo4j 2.0: Indexing array-valued properties with schema indexing

I have nodes with multiple "sourceIds" in one array-valued property called "sourceIds", just because there could be multiple resources a node could be derived from (I'm assembling multiple databases into one Neo4j model).
I want to be able to look up nodes by any of their source IDs. With legacy indexing this was no problem, I would just add a node to the index associated with each element of the sourceIds property array.
Now I wanted to switch to indexing with labels and I'm wondering how that kind of index works here. I can do
CREATE INDEX ON :<label>(sourceIds)
but what does that actually do? I hoped it would just create index entries for each array element, but that doesn't seem to be the case. With
MATCH n:<label> WHERE "testid" in n.sourceIds RETURN n
the query takes between 300ms and 500ms which is too long for an index lookup (other schema indexes work three to five times faster). With
MATCH n:<label> WHERE n.sourceIds="testid" RETURN n
I don't get a result. That's clear because it's an array property but I just gave it a try since it would make sense if array properties would be broken down to their elements for indexing purposes.
So, is there a way to handle array properties with schema indexing or are there plans or will I just have to stick to legacy indexing here? My problem with the legacy Lucene index was that I hit the max number of boolean clauses (1024). Another question thus would be: Can I raise this number? Lucene allows that, but can I do this with the Lucene index used by Neo4j?
Thanks and best regards!
Edit: A bit more elaboration on why I hit the boolean clauses max limit: I need to export specific parts of the database into custom file formats for text processing pipelines. These pipelines use components I cannot (be it for the sake of accessibility or time) change to query Neo4j directly, so I'd rather stay with the defined required file format(s). I do the export via the pattern "give me all IDs in the DB; now, for batches of IDs, query the desired information (e.g. specific paths) from Neo4j and store the results to file". Why I use batches at all? Well, if I don't, things are slowed down significantly via the connection overhead. Thus, large batches are a kind of optimization here.
Schema indexes can only do exact matches right now. Your "testid" in n.sourceIds does not use the index (as shown by your query times). I think there are plans to make this behave better, but I'm waiting for them just as eagerly as you are.
I've actually hit a lower max in the lucene query: 512. If there is a way to increase it I'd love to hear of it. The way I got around it is just doing more than one query if I have one of the rare cases that actually goes over 512 ids. What query are you doing where you need more?

Using Aggregate in Gemlin through Neo4j REST

I'm using neo4j 1.7 through the REST interface, and I punched in the following query:
{"script": "g.v(1).aggregate(x); g.V.except(x)", "params": {"x":[]}}
which should return the list, missing Node 1, but instead this returns the entire list of Nodes. I've looked over the neo4j documentation and see examples of using variables, but this query does not seem to behave as exepected.
Has anyone else run into this problem or is this something can't/shouldn't be done through the gremlin REST interface?
When you're not in the Gremlin REPL, you need to manually iterate expressions when it's not the last expression returned (the Gremlin Plugin automatically iterates the last expression):
g.v(1).aggregate(x).iterate(); g.V.except(x)
But you can simplify it down to one statement like this:
g.V.except([g.v(1)])

Resources