Weird results with cypher query in Neo4j and node_auto_index - neo4j

I have a graph database (Neo4j) in which I configured a property to be auto indexed with full-text. Everything is working great except that I have 1 row that is not returned when I execute a particular cypher query.
My property in the graph equals (I've put in bold the words I am using in my cypher query):
1pizzeriadeicomparipourlesamateursdevraiespizzasitaliennescestadireavecpastropdepateetcuitesaufeudeboislaplacenepayepasdeminesalleettablesassezpetitesetilfautsarmerdepatiencelessamedisoirssionnapasreserveenv15minutesdattentemaislespizzassontexcellentesrestaurantmontrealmontrealquebeccanada5148435411
If I execute the following cypher query:
START n1=NODE:node_auto_index('Search_Field:*res* AND Search_Field:*taurant* AND Search_Field:*411*')
RETURN n1.Search_Field
My row is returned!
So far no problem!
But when I execute it by putting the word « restaurant » all together like this:
START n1=NODE:node_auto_index('Search_Field:*restaurant* AND Search_Field:*411*')
RETURN n1.Search_Field
Then no rows are returned.
I tested a lot of stuffs in order to understand and try to find a pattern or something that can explain the problem. It seems like the length of my property value might play a role. I know it sounds strange but if I add 3 or more letters, let say « aaa », after the word restaurant in the property value, like this (look at the bold letters close to the end of the value):
1pizzeriadeicomparipourlesamateursdevraiespizzasitaliennescestadireavecpastropdepateetcuitesaufeudeboislaplacenepayepasdeminesalleettablesassezpetitesetilfautsarmerdepatiencelessamedisoirssionnapasreserveenv15minutesdattentemaislespizzassontexcellentesrestaurantaaamontrealmontrealquebeccanada5148435411
then, if I execute the same cypher query, the row is now returned.
Anyone had encountered similar problems! It's driving me crazy!
I have tested on both Neo4j-enterprise 2.2.1 and the latest Community 3.0.0-M02. Same result with both of them.
Any idea on where or what should I look for ?

The query term get passed through the lucene analyzer - just like the contents you index. I'm not 100% sure but I think that the default analyzer "eats up" the digits, that's why you don't get the results.
You can supply an analyzer class when the index is created for the first time. Also you can use Java API to query the index - this allows to pass in instances of Lucene Query, see my example at http://blog.armbruster-it.de/2014/10/deep-dive-on-fulltext-indexing-with-neo4j/.

Related

neo4j fulltext index search with special charcters

We are using neo4j version 4.1.1,
and we have a graph that represents a structure of objects.
we support translation using nodes for translation and the connection between an object and a translation node is the object name and description.
for example:
(n:object)-[r:Translation]-(:ru)
means that on relationship r is the name and description of object n in russian.
In order to search by name and description we implemented a fullText index like that:
CALL db.index.fulltext.createRelationshipIndex("TranslationRelationshipIndex",["Translation"],["Name","Description"], { eventually_consistent: "true" })
We also support search for items in order to do it we are using the index to query and we have names like "UFO41.SI01V03":
CALL db.index.fulltext.queryRelationships('TranslationRelationshipIndex', '*FO41.SI0*') YIELD relationship, score 
but for names as shown above([0-9.*]) no results are returned
while results are returned for name like "ab.or"
Is there any one who knows how to make it work? I've tried all 46 analyzers available.
I know we can solve it just using match()-[r]-() where r.Name contains "<string>"
but we prefer a more efficient index-using solution to this problem.
stay safe!
and thanks in advance.
p.s if needed I can supply a few lines to recreate it locally just ask.
The analyzer will probably recognise words like ab.or differently than ab.or123 and consider them a single token in the first case and two tokens in the second case.
There is no analyzer that will really fit your needs except than creating your own.
You can however replace the . in your query with a simple AND, for eg :
CALL db.index.fulltext.queryNodes('Test', replace("*FO41.SI0*", ".", " AND "))
Will return you the results you're looking at.
Resources for creating your own analyser :
https://graphaware.com/neo4j/2019/09/06/custom-fulltext-analyzer.html
https://neo4j.com/docs/java-reference/current/extending-neo4j/full-text-analyzer-provider/

Weird results from cypher collaborative filter query

I am using one of the neo4j practice graphs (see below) to learn cypher
and running a query to search for people who both acted int and directed a movie, I'm running the following commands:
:play movie graph
MATCH (p:Person)-[a:ACTED_IN]->(m:Movie)<-[d:DIRECTED]-(p)
RETURN p,m,a,d,type(a),type(d)
I few things don't make sense:
for some rows in the result type(a) is not ACTED_IN but
PRODUCER or WROTE etc.
a lot of nodes are returned which don't seem to satisfy this pattern
using OPTIONAL MATCH works exactly right but I don't know why?
Any help would be much appreciated
As cybersam commented, this definitely looks like a bug in the compiled runtime.
If you PROFILE this you can see it's using compiled runtime; if you prefix the query with CYPHER runtime=slotted we get expected results.
I'll pass this along to the cypher team.

How do I find a string in an unknown neo4j database using Cypher?

TL,DR: I need a query which gives me all nodes/relationships which contain a certain value (in my case a string, so much I know), without knowing which property(key) contains the string. I am using neo4j(latest version), meteor (latest version) and the meteor neo4j driver, which is recommended on the neo4j website for meteor.
Currently I am working (as part of my bachelor thesis) on a tool to visualize the output of any Cypher query on any database, regardless of the database contents.
So far I've managed to correctly display nodes/relationships which are coming out. My problem now is to visualize (get nodes and relationships to feed into my frontend) textual queries like (taken from the neo4j movie database, which I am using for development)
MATCH (tom:Person {name:"Tom Hanks"})-[:ACTED_IN]->(m)<-[:ACTED_IN]-(coActors)
RETURN coActors.name
This kind of queries only returns an array of strings and not whole nodes or relationships. I now need some way (preferably a Cypher query) to get all nodes which contain for example the string "Audrey Tatou".
The problem I've now run into is that I didn't find a way to write a query which doesn't need something like
MATCH n
WHERE Person.name = "some name"
Since I don't know anything about the contents of the database I cannot use
WHERE propertyName = "propertyValue"
since I only know the value but not the name of the property.
The only solution here will be to get every nodes with your label and check properties and values using reflection on client side.
Using cypher, the solution would be to get all properties and their values and parse their values using a foreach loop. Maybe you can do this, but I'm really not sure, it's a recent feature but you can still give a try.
Here is what I found for the cypher solution: How can I return all properties for a node using Cypher?
So, you have query that returns array of string.
In fact - you can receive almost anything as result. Cypher is capable to return just bare strings, that are not related to anything.
Long story short - you can't vizualize this data, because of this data nature. Best you can do is to represent them as table (or similar), like Neo4j browser do this.
But, there is (probably) solution for you. Neo4j has feature called Legacy indexing. And there you can find full text indexes. Maybe this can help you.
You can just use a driver that returns nodes and rels, or if you do the queries manually add resultDataContents entry
{statements:[{statement:"MATCH ..","resultDataContents",["graph"]}]}
to your payload and you get nodes and relationships back.

Lucene full-text index: all indexed nodes with same score?

I have been trying solving this issue since days.
I want to do a START query against full-text, ordered by relevance, so to paginate results.
Gladly, I finally found this thread on full-text indexing and neo (and using python as driver).
[https://groups.google.com/forum/#!topic/neo4j/9G8fcjVuuLw]
I had imported my db with batch super-importer, and got a reply of #Michaelhunger who kindly noticed there was a bug, all scores would had been imported the same value.
So, now I am recreating the index, and checking the score via REST (&order=score)
http://localhost:7474/db/data/index/node/myInde?query=name:myKeyWord&order=score
and noticed that entries have still the same score.
(You've got to do an ajax query to see it cause if you use the web console you won't see all data!!)
My code to recreate a full-text lucene index, having each node property 'name':
(here using neo4j-rest-client, but I will try also with py2neo as in the Google discussion):
from neo4jrestclient.client import GraphDatabase
gdb = GraphDatabase("http://localhost:7474/db/data/")
myIndex = gdb.nodes.indexes.create("myIndex", type="fulltext", provider="lucene")
myIndex.add("name",node.get("name"),node)
results:
http://localhost:7474/db/data/index/node/myInde?query=name:DNA&order=score
data Object {id: 17062920, name: "DNA damage theory of aging"}
VM995:10 **score 11.097855567932129**
...
data Object {id: 17022698, name: "DNA (film)"}
VM995:10 **score 11.097855567932129**
In the documentation:
[http://neo4j.com/docs/stable/indexing-lucene-extras.html#indexing-lucene-sort]
it is written that Lucene does the sorting itself very well, so I understood it creates a ranking by itself in import; it does not.
What am I doing wrong or missing?
I believe the issue you are seeing is related to a combination of the text you are indexing, the query term(s) and as Michael Hunger pointed out the current lucene configuration in Neo4j which has OMITNORMS=true. With this setting a lucene query, as in your posted examples, where there is text of different size but the query term appears once in each document often results in the same lucene relevancy score. The reason is that the size/length of the document being indexed (field length normalization) is NOT taken into account when OMITNORMS is true.
Looking at your examples it is not clear what your expected results are. For example, are you expecting documents with shorter text to appear first?
In my own experience using lucene and Neo4j I have seen many instances where the relevancy scores being returned are different across different queries.
The goal of my question is to obtain a list of results ordered by relevance of nodes' names matching the queried keywords.
#mfkilgore point out this work-around:
start n=node:topic('name:(keyword1* AND keyword2*)') MATCH (n) with n order by length(split(n.name," ")) asc limit 20 return n
This workaround counts the chars in a node's name, and then order by length of string.

Neo4j Embedded - Auto Index Multiple Properties

I turned on node auto-indexing and it's indexing the properties I need. If I start up the Neo4j server and open the webadmin, I see that there is an index called node_auto_index as per this post. It works perfectly from the webadmin and I can run Cypher queries like this:
START n=node:node_auto_index('__type:user AND __username:admin') RETURN n
The query returns exactly what I expect. However, if I shut down the server and open the DB in embedded mode from a Scala application, this doesn't work. If I try to run the same Cypher query, I get an error that node_auto_index doesn't exist. I checked the GraphDatabaseService properties, and auto indexing is up and running on the right keys, but when getting a list of all of the index names, the list is always empty. And I can't use the AutoIndex API because it only indexes on one property, and I definitely need both.
So from this point, what would be the best way to ago about querying the auto-index with multiple properties from my Scala (Java) code?
EDIT: I noticed that the ReadableIndex interface (which is what the auto-index is) can take a query string. I can't find much documentation on it, so I'm going to try a few things, but is there any chance that could take a Cypher query? Or just the single-quoted string in my query above?
Turns out that the query function of the ReadableIndex actually takes a Lucene Query, which I now realize is what I had quoted above. So calling this code:
val nodes = db.index.getNodeAutoIndexer.getAutoIndex.query("__type:user AND __username:admin")
Gave me exactly what I wanted.

Resources