How to rebuild Neo4j Lucene index? - Neo4j.rb - ruby-on-rails

I am running on Neo4j (1.4) using Neo4j.rb gem (1.2.2) on Rails 3.1
I bumped into problem where neo4j index was corrupted that I cant run the database anymore, as mentioned in several forums like this I deleted the db/index dir and it worked. However I need to rebuild the index again.
I could not find anywhere in the docs on how to rebuild the index, could anybody please help?
Thanks alot!

You should go into your database directory and remove
Directory named index
The file index.db
and later on traverse the hole set of nodes and edges, updating the properties of each node.
/purbon

My problem was similar - after upgrading to neo4j 1.5 (from 1.4) my indexes got corrupted.
My case:
I had two indexes:
__types__ : for indexing the type of persisted objects (provided by spring-data-neo4j 2.0.0.RC1)
User : for indexing the username field, so I could do lookups after them
This resulted in a major problem, where I could find all the nodes by their id, but could not do lookups after username, or list all objects of a certain type.
The fix ( I will provide java code, but the idea would be the same in other languages too):
/* begin a transaction */
Transaction tx = graphDatabaseService.beginTx();
/* for all nodes in the database */
for (Node node : graphDatabaseService.getAllNodes()) {
/* reconstruct the saved object based on the __type__ property on the node - the result is a class that was annotated with #NodeEntity */
DefaultDbNode ddn = neo4jTemplate.createEntityFromStoredType(node,
null);
/* reindex this node, adding it to the __types__ index, with key "className" (it is used by spring-data-neo4j) with the value __type__ */
graphDatabaseService.index().forNodes("__types__")
.add(node, "className", node.getProperty("__type__"));
/* if the reconstructed object is a User object */
if (ddn instanceof User) {
/* add it to the User index, with the key "username" (which is also the saved fields name) */
graphDatabaseService.index().forNodes("User")
.add(node, "username", node.getProperty("username"));
}
}
/* end transaction */
tx.success();
tx.finish();
Hope this helps you or someone out!

Thanks to all who tried to help. In my case I have successfully solved the problem by taking the following steps:
Step 1 Following the recommendation from Neo4j's Michael Hunger (via mailing list), I used a tool called checkindex to remove corrupt index entries Lucene and Solr's Checkindex
Step 2 Upon removal of corrupt index entries, the remaining problem is to build them so Lucene can start querying them again. This could simply be done using using Model.addindex(:index_name). Note that this operation needs to be wrapped within a Neo4j::Transaction. In my case I ran it on railsconsole but I suppose you could also code them within the rails app.
Example:
Neo4j::Transaction.run do
User.all.each do |user|
user.add_index(:first_name)
user.add_index(:email)
user.save
end
end
Hope this could help others who face the same problems.
Cheers

Related

Spring Elasticsearch reindex method

I've written a reindex method that does the following:
public void reindex() {
IndexOperations indexOperations = elasticsearchOperations.indexOps(Song.class);
List<Song> songs = songRepository.findAll();
songSearchRepository.deleteAll();
indexOperations.delete();
indexOperations.create();
songSearchRepository.saveAll(songs);
}
It does the job but I'm now sure whether it makes sence just to delete and then create an index. How can I improve this method?
Actually I don't see the point in reindexing to the same index.
If you want to reindex to a different index, you should use the reindex API of Elasticsearch. This is not yet supported directly in Spring Data Elasticsearch.
To reindex to a different index using Spring Data Elasticsearch you should use paged queries in a loop to read from one index and write the data to the second index.
Edit 16.10.2020:
Don't use copying in a loop as I suggested, like #joeyave commented, the indices can get out of sync here.
I created an issue in Spring Data Elasticsearch Jira to have reindex support implemented.

Neo4j: How to call "CREATE INDEX" only if not exists

The CREATE INDEX <indexName> command is not idempotent and will cause an error if the given index already exists. I'm new to neo4j, and can't find a predicate that avoids this error. I've tried various permutations of ANY(...), and they all barf at "db.indexes()".
Since CREATE INDEX ... fails if the index exists and DROP INDEX ... fails if it doesn't, I don't know how to write a .cypher file that creates the index only if needed.
A short form might be something like CREATE INDEX indexName FOR (c:SomeLabel) ON (c.someProperty) IF NOT EXISTS, but of course that short form doesn't exist.
Is there some way to do this with a predicate, subquery or some such expression?
As of Neo4j 4.1.3, a new index creation syntax has been introduced to do just that
CREATE INDEX myIndex IF NOT EXISTS FOR (t:Test) ON (t.id)
Indexes for search performance
You can use the apoc.schema.node.indexExists function to check whether an index exists before creating it.
For example, this query will create the :Foo(id) index if it does not already exist:
WITH 1 AS ignored
WHERE NOT apoc.schema.node.indexExists('Foo', ['id'])
CALL db.createIndex('index_name', ['Foo'], ['id'], 'native-btree-1.0') YIELD name, labels, properties
RETURN name, labels, properties
For some reason, the Cypher planner currently is not able to parse the normal CREATE INDEX index_name ... syntax after the above WHERE clause, so this query uses the db.createIndex procedure instead.
There is also a much more powerful APOC procedure, apoc.schema.assert, but it may be overkill for your requirements.
By default, the command is ignored if the index exists.
Can you test the following?
CREATE (n:Car {id: 1});
Added 1 label, created 1 node, set 1 property, completed after 23 ms.
CREATE INDEX ON :Car (id);
1st execution: Added 1 index, completed after 6 ms.
2nd execution : (no changes, no records)
I tried both suggestions, and neither solves my issue. I don't have time to discover, through trial-and-error, how to install APOC in my environment.
The first line of mbh86's answer is inaccurate, at least in my system. The command is not ignored, it fails with an error. So if anything else is in the same cypher script, it will fail.
The best I can do is apparently to wrap the CREATE INDEX in a command-line string, run that string from either a bash or python script, run it, and check the return code from the calling program.
I appreciate the effort of both commentators, and I didn't want to leave either hanging.

How to access dynamic attribute 'geo_near_distance' with Mongoid

I am using Mongoid 3.1.6 with Rails 4. I need to find all the objects 'near' a certain co-ordinate. For each result from the search, I will need to display the distance from the search co-orodinate. According to Mongoid Documentation
...each instantiated document from a $geoNear query will get a special
dynamic attribute geo_near_distance that will be available as long as
the document is in memory.
But I am not able to access the Object.geo_near_distance
My query inside controller...
#objects = Object.geo_near([-118.4451, 34.0633]).max_distance(10)
Edit#1
Some additional details
If the use the following query in MongoDB
db.runCommand( { geoNear: "objects",
near: [ -73.95269,40.77578],
spherical: true
})
I see an array of 100 elements. Each element has 2 attributes. The first one, 'dis' has values like '0.000123' (Note: this is not in Km or Mile) and the second attribute is the result Object itself.
Now I have changed the query to Mongoid to...
#objects = Object.geo_near([-118.4451, 34.0633]).spherical.max_distance(10)
still no result.
Thanks in advance for your help.
After more than 2 years, the issue ticket is still open on mongodb jira tracker.
The quick fix is not use the hash notation instead of the dot notation to access the attribute:
Instead of
Object.geo_near_distance
Use
Object['geo_near_distance']
Tested on mongoid 6
Are you accessing the field while you are iterating the documents? You can see by the specs that this field is in fact there when the document is in memory and is being part of the iteration of the criteria result.
https://github.com/mongoid/mongoid/blob/master/spec/mongoid/contextual/geo_near_spec.rb#L167

Update Contour form(records) using the record ID

I can successfully create entries in contour programmatically(C#) but I am not able to update the created record using the record ID. After digging my head around can’t find a reason why the following code doesn’t work. It’s very basic and all That I am trying to do is get the record that exist in the contour.
RecordStorage recordStorage = new RecordStorage();
Record r = recordStorage.GetRecord(new Guid("15d654cb-a7c6-4f1f-8b55-0ecd7d19b0e3"));
recordStorage.Dispose();
Just to start with the update process, I am trying to get the record object using it’s id but can’t proceed further as it throws a weird error “An item with the same key has already been added.” I can’t understand while it’s trying to set the value when I call the “storage.GetRecord()”. Following is the stack trace
**An item with the same key has already been added.**
at System.Collections.Generic.Dictionary`2.Insert(TKey key, TValue value, Boolean add)
at Umbraco.Forms.Data.Storage.RecordFieldStorage.GetAllRecordFields(Record record)
at Umbraco.Forms.Data.Storage.RecordStorage.GetRecord(Object id)
at MauriceBlackburn.Service.ContourFormService.InsertRecord(ContourFormFields unionContourForm)
Any thoughts, have I missed something, I have been digging all day around and still not able to figure this out. Thanks in advance.
Much Appreciated.
First off, try deleting the workflow and re-adding it.
You could also create two simple workflows, one that will write the record and a second to manipulate it (using the id when written).
Make sure that there are no records with the same ID in database. You might have inserted them before.

Keeping elasticsearch and database in sync

I am trying to figure out a way to keep my mysql db and elasticsearch db in sync. I have setup a jdbc river using the jprante / elasticsearch-river-jdbc plugin for elasticsearch. When I execute the below request:
curl -XPUT 'localhost:9200/_river/my_jdbc_river/_meta' -d '{
"type" : "jdbc",
"jdbc" : {
"driver" : "com.mysql.jdbc.Driver",
"url" : "jdbc:mysql://localhost:3306/MY-DATABASE",
"user" : "root",
"password" : "password",
"sql" : "select * from users",
"poll" : "1m"
},
"index" : {
"index" : "test_index",
"type" : "user"
}
}'
the river starts indexing data, but for some records I get org.elasticsearch.index.mapper.MapperParsingException. Well there is discussion related to this issue here, but I want to know a way to get around this issue.
Is it possible to permanently fix this by creating an explicit mapping for all 'fields' of the 'type' that I am trying to index or is there a better way to solve this issue?
Another question that I have is, when the jdbc-river polls the database again, it seems to re-index the entire data-set(given in sql query) again into ES. I am not sure, but is this done because elasticsearch wants to add fresh data as well as update any changes in the existing data? Is it possible to index only the fresh data, if the table's data is static?
Did you look at default mapping?
http://www.elasticsearch.org/guide/reference/mapping/dynamic-mapping.html
I think it can help you here.
If you have an insertion date field in your datatable, you can use it to filter what you have to index.
See https://github.com/jprante/elasticsearch-river-jdbc#time-based-selecting
HTH
David
Elastic Search has dropped the river sync concept at all. It is not a recommended path, because usually it doesn't make sense to keep same normalized SQL table structure in document store like Elastic Search.
Say, you have Product as an entity with some attributes, and Reviews on Product entity as a parent child table as Reviews could be multiple on same table.
Products(Id, name, status,... etc)
Product_reviewes(product_id, review_id)
Reviews(id, note, rating,... etc)
In document store you may want to create a single Index with name say product that includes Product{attribute1, attribute1,... Product reviews[review1, review2,...]}
Here is approach of syncing in such setup.
Assumption:
SQL Database(True Source of record)
Elastic Search or any other NoSql Document Store
Solution:
As soon as Update/updates happens in Publish event/events in JMS/AMQP/Database Queue/File System Queue/Amazon SQS etc. either full Product or primary object ID(I would recommend just ID)
Queue consumer should then call the Web Service to get full object if only Primary ID is pushed to Queue or just take the object it self and send the respective changes to Elastic search/NoSQL database.

Resources