Neo4j Embedded Fulltext Automatic Node Index - neo4j

When running Neo4j embedded, the default configuration doesn't have the automatic node index set as fulltext (meaning that all Lucene queries are case sensitive). How can I configure the automatic index to be fulltext?

For starters, you must perform this on a new database. The automatic index is lazily created, which means that it isn't created until the first access. You have until the first access to perform this configuration. If you attempt to change the property after it's already been created, it won't work. So the first step is to load the database with automatic indexing enabled (node or relationship).
val db = new GraphDatabaseFactory().newEmbedddedDatabaseBuilder("path/to/db").
setConfig(GraphDatabaseSettings.node_keys_indexable, "label,username").
setConfig(GraphDatabaseSettings.node_auto_indexing, "true").newGraphDatabase()
Now, before you do anything, you have to set the configuration properties. You can find out about the possible properties and values here. To do this, we just need two more lines.
val autoIndex = db.index.forNodes("node_auto_index")
db.index.setConfiguration(autoIndex, "type", "fulltext")
And that's all there is to it. You can now create vertices and relationships and the automatic index will be created and populated. You can get use the following code to query it using any Lucene query.
autoIndex.getAutoIndex.query("label:*caseinsensitive*")

Related

return the value uuid in the nodes created in Neo4j ogm

I'm working Neo4j from PHP. To generate the uuid field in the nodes I am using: neo4j-uuid.
I also use: graphaware/neo4j-php-ogm, when I create a node, I do not return the value assigned to the UUID field, I have to make a new query to get that value, I need to hydrate the UUID value automatically when the object is created, just like the ID is hydrated.
From the GraphAware Neo4j UUID Github Repo:
If you create a node and return it immediately, its contents will not
reflect changes performed by transaction event handlers such as this
one -- thus the UUID will not be available. A separate call must be
made to get the UUID.
That is: this is the expected behavior. Currently you should make a new query to get the node with the generated UUID property.
As it says #bruno-peres the value of the uuid is not automatically hydrated, so I invoke the refresh method of the EntityManager
$this->em->persist($entity);
$this->em->flush();
$this->em->refresh($entity);
var_dump($p->getUuid())

Mongo Template : Modifying Match Operation dynamically

I have defined my match operation in mongo template as below.
MatchOperation match = Aggregation.match(new Criteria("workflow_stage_current_assignee").ne(null)
.andOperator(new Criteria("CreatedDate").gte(new Date(fromDate.getTimeInMillis()))
.andOperator(new Criteria("CreatedDate").lte(new Date(toDate.getTimeInMillis())))));
Everything is fine until this. However I can not modify this match operation using the reference match I have created. I was looking for List kind of functionality where in I could add multiple criteria clauses as and when they are needed to an already created reference. Something on the lines match.add(new Criteria)
However MatchOperation currently does not support any methods which would provide this functionality. Any help in this regard would be appreciated.
Criteria is where you add new criteria, which is backed by list.
Use static Criteria where(String key) method to create a initialize criteria object.
Something like
Criteria criteria = where("key1").is("value1");
Add more criteria's
criteria.and("key2").is("value2");
Create implicit $and criteria and chain to existing criteria chain.
criteria.and(where("key3).gt(value3).lte(value4))
When you are done, just pass it to match operation.
MatchOperation match = Aggregation.match(criteria);

Neo4j Uniquness on update unique index value

The
uniqueness=create_or_fail
works great when creating a new node since it throws a 4xx response if a duplicate index key/value already exists.
However, if the node already exists and indexed and the indexed value needs to be updated, there is no way (that i am aware of) to update the value and fail if the new value already exists. That is because the Add Node to Index REST call does not throw a 4xx response if the new value already exists. as far as i can see the add node to index does not even participate in Uniqueness on indexes.
One solution is to delete the node and re-add it but this is not easy since all the other indexes and relationships on this node would have to be recreated.
another solution would be to add the Uniqueness parameter to the Add Node to Index REST call
http://docs.neo4j.org/chunked/1.9.M05/rest-api-indexes.html#rest-api-add-node-to-index
any other ideas on this?
thanks
I happened up on this question and here's what I figured out for a work-around.
During an update do as follows in a REST batch:
Delete all of node's index entries for the desired index
Create a new node using CreateOrFail on the desired index, except instead of using your normal properties just use a dummy property such as DeleteMe=true
Add the node to the desired index because if it got this far the previous step succeeded
Update node's properties
Use a Cypher statement to delete the dummy node Ex:
START n=node:index_name(index_key={value}) WHERE (n.DeleteMe!)=true DELETE n

Neo4j indexes and legacy data

I have a legacy dataset (ENRON data represented as GraphML) that I would like to query. In an comment in a related question, #StefanArmbruster suggests that I use Cypher to query the database. My query use case is simple: given a message id (a property of the Message node), retrieve the node that has that id, and also retrieve the sender and recipient nodes of that message.
It seems that to do this in Cypher, I first have to create an index of the nodes. Is there a way to do this automatically when the data is loaded from the graphML file? (I had used Gremlin to load the data and create the database.)
I also have an external Lucene index of the data (I need it for other purposes). Does it make sense to have two indexes? I could, for example, index the Neo4J node ids into my external index, and then query the graph based on those ids. My concern is about the persistence of these ids. (By analogy, Lucene document ids should not be treated as persistent.)
So, should I:
Index the Neo4j graph internally to query on message ids using Cypher? (If so, what is the best way to do that: regenerate the database with some suitable incantation to get the index built? Build the index on the already-existing db?)
Store Neo4j node ids in my external Lucene index and retrieve nodes via these stored ids?
UPDATE
I have been trying to get auto-indexing to work with Gremlin and an embedded server, but with no luck. In the documentation it says
The underlying database is auto-indexed, see Section 14.12, “Automatic Indexing” so the script can return the imported node by index lookup.
But when I examine the graph after loading a new database, no indexes seem to exist.
The Neo4j documentation on auto indexing says that a bunch of configuration is required. In addition to setting node_auto_indexing = true, you have to configure it
To actually auto index something, you have to set which properties
should get indexed. You do this by listing the property keys to index
on. In the configuration file, use the node_keys_indexable and
relationship_keys_indexable configuration keys. When using embedded
mode, use the GraphDatabaseSettings.node_keys_indexable and
GraphDatabaseSettings.relationship_keys_indexable configuration keys.
In all cases, the value should be a comma separated list of property
keys to index on.
So is Gremlin supposed to set the GraphDatabaseSettings parameters? I tried passing in a map into the Neo4jGraph constructor like this:
Map<String,String> config = [
'node_auto_indexing':'true',
'node_keys_indexable': 'emailID'
]
Neo4jGraph g = new Neo4jGraph(graphDB, config);
g.loadGraphML("../databases/data.graphml");
but that had no apparent effect on index creation.
UPDATE 2
Rather than configuring the database through Gremlin, I used the examples given in the Neo4j documentation so that my database creation was like this (in Groovy):
protected Neo4jGraph getGraph(String graphDBname, String databaseName) {
boolean populateDB = !new File(graphDBName).exists();
if(populateDB)
println "creating database";
else
println "opening database";
GraphDatabaseService graphDB = new GraphDatabaseFactory().
newEmbeddedDatabaseBuilder( graphDBName ).
setConfig( GraphDatabaseSettings.node_keys_indexable, "emailID" ).
setConfig( GraphDatabaseSettings.node_auto_indexing, "true" ).
setConfig( GraphDatabaseSettings.dump_configuration, "true").
newGraphDatabase();
Neo4jGraph g = new Neo4jGraph(graphDB);
if (populateDB) {
println "Populating graph"
g.loadGraphML(databaseName);
}
return g;
}
and my retrieval was done like this:
ReadableIndex<Node> autoNodeIndex = graph.rawGraph.index()
.getNodeAutoIndexer()
.getAutoIndex();
def node = autoNodeIndex.get( "emailID", "<2614099.1075839927264.JavaMail.evans#thyme>" ).getSingle();
And this seemed to work. Note, however, that the getIndices() call on the Neo4jGraph object still returned an empty list. So the upshot is that I can exercise the Neo4j API correctly, but the Gremlin wrapper seems to be unable to reflect the indexing state. The expression g.idx('node_auto_index') (documented in Gremlin Methods) returns null.
the auto indexes are created lazily. That is - when you have enabled the auto-indexing, the actual index is first created when you index your first property. Make sure you are inserting data before checking the existence of the index, otherwise it might not show up.
For some auto-indexing code (using programmatic configuration), see e.g. https://github.com/neo4j-contrib/rabbithole/blob/master/src/test/java/org/neo4j/community/console/IndexTest.java (this is working with Neo4j 1.8
/peter
Have you tried the automatic index feature? It's basically the use case you're looking for--unfortunately it needs to be enabled before you import the data. (Otherwise you have to remove/add the properties to reindex them.)
http://docs.neo4j.org/chunked/milestone/auto-indexing.html

How to build a custom Lucene index for Neo4j graph?

I am using Gremlin and Neo4j to manipulate the ENRON dataset from infochimps. This dataset has two types of vertexes, Message and Email Addresss and two types of edges, SENT and RECEVIED_BY. I would like to create a custom index on this dataset that creates a Lucene document for each vertex of type: 'Message' and incorporates information from associated vertexes (e.g., v.in(), v.out()) as additional fields in the Lucene document.
I am thinking of code along the lines of
g = new Neo4jGraph('enron');
PerFieldAnalyzerWrapper analyzer =
new PerFieldAnalyzerWrapper(new StandardAnalyzer());
analyzer.addAnalyzer("sender", new KeywordAnalyzer());
analyzer.addAnalyzer("recipient", new KeywordAnalyzer());
IndexWriter idx = new IndexWriter (dir,analyzer,IndexWriter.MaxFieldLength.UNLIMITED);
g.V.filter{it.type == 'Message'}.each { v ->
Document doc = new Document();
doc.add(new Field("subject", v.subject));
doc.add(new Field("body", v.body));
doc.add(new Field("sender", v.in().address);
v.out().each { recipient ->
doc.add(new Field("recipient", recipient.address));
}
idx.addDocument(doc);
}
idx.close();
My questions are:
Is there a better way to enumerate vertexes for indexing?
Can I use auto-indexing for this, and if so, how to I specify what should be indexed?
Can I specify my own Analyzer, or am I stuck with the default? What is the default?
If I must create my own index, should I be using gremlin for this, or am I better off with a Java program?
I will be talking about direct Neo4j access here since I'm not well travelled in Gremlin.
So you'd like to build a Lucene index "outside of" the graph itself? Otherwise you can use the built in graphDb.index().forNodes( "myIndex", configForMyIndex ) to get (created on demand) a Lucene index associated with neo4j. You can then add multiple fields to each document by calling index.add( node, key, value ), where each node will be represented by one document in that Lucene index.
1) In Gremiln... I don't know
2) See http://docs.neo4j.org/chunked/milestone/auto-indexing.html
3) See http://docs.neo4j.org/chunked/milestone/indexing-create-advanced.html
4) Do you need to create it outside of the db entirely? If so, why?
I just finished an import with a Java process and it's really easy, in my opinion better inclusive through Gremlin.
Anyway, if the process is failing is because of you CAN'T create a new object of StandardAnalyzer. All the constructors of that class require parameters, so you should create a wrapper class or create it with the right version of Lucene like paramater in the constructor.
Neo4J, until today, accepts only until the lucene version 36.

Resources