elasticsearch index transactions - search-engine

I am exploring elasticsearch and comparing it with our current search solution. The use case I have is, , everytime I build index, I have to drop the current index and create the new one with the same name. So that all the old docs are dropped with the old index and the new index will have the fresh data. The indexing process takes couple of minutes to finish.
My question is what happens to the search requests coming in during this time. Does elastic search uses transaction and only commit all changes (dropping the index and new index with the new documents) in a transaction?
What happens if I deleted the index, and an error occurs during the middle of the indexing?
If there are no transactions, is there any workaround to this situation?

Elasticsearch doesn't support transactions. When you delete an index, you delete an index. Until you create a new index users will be getting IndexMissingException exceptions. Once the new index is created they will see only records that were indexed and refreshed (by default refresh occurs every second).
One way to hide this from users is by using aliases. You can create an alias that will point to an index. When you need to reindex your data, you can create a new index, index new data there, switch the alias to the new index and delete the old index.

Related

ROR maintain history of object updates

I have a model which I want to store a history of changes to, my plan is to rather than update an object create a new one and on a show only fetch the latest version.
This plan presents a number of difficulties firstly the id will be different after a update I indend to get around this by keeping a second ID column which will be the same for all updates of that instance.
to that end I have created a SQLite sequence for this second coloumn.
my question is how can I get values from this sequence in the model/controller as I will only want to get from it on first time the object is created, secondly how can I use this second ID column as the URL for the object so it is fixed throughout updates.
Many Thanks,
Check out the PaperTrail gem. It might do what you want and sidestep those issues completely.
https://github.com/airblade/paper_trail

Delete, Create and Add Nodes to Index Neo4j

A quick question.
In a single transaction, can't I do the followings:
Delete index say indexMaster if already exists
Create index again indexMaster
Add nodes to index indexMaster
`
When I did the above things I got exception.
This index (Index[indexMaster,Node]) has been marked as deleted in this transaction
This exception occurs at line on which I am adding nodes to it.
EDITED:
I am using Neo4j 2.0.4
Code using Java not REST API
Any Idea
Thanks
Not 100% sure here but I guess it is not possible to delete and recreate the same index in the same transaction. Try to use two transactions, one for deleting the index, the other for creating it.

ServerPlugins in Neo4j 2.0.0-M03: Where to create schema index

I'm going to check out the new automatic indexing capabilities that come with Neo4j 2.0. They are described here: http://docs.neo4j.org/chunked/2.0.0-M03/tutorials-java-embedded-new-index.html
Now the automatic index must created at one point. The old way to get an index was just "indexManager.forNodes()" and the index was returned if existing, created if not. With automatic indexing, we just have to create the index once via "schema.indexFor()..." and then be done with it.
My question is, where do I best put the index creation? In the documentation example, they have a main method. But I'm working with a ServerPlugin. I'd like to create the indexes once at startup, if they do not already exist. But where can I do this? And how to I check whether the index already exists? I can get all IndexDefinition for a label. But since an IndexDefinition may depend on a label and on a arbitrary property, I would have to iterate through all IndexDefinitions for a specific label and check whether the one with the correct property does exist.
I could of course simply do what I just wrote, but it seems a bit cumbersome compared to the old index handling which would check automatically whether the requested index exists and create it, if not. So I'm wondering if I simply missed some key points with the handling of the new indices.
Thank you!
I got a response from a Neo4j dev here: http://docs.neo4j.org/chunked/2.0.0-M03/tutorials-java-embedded-new-index.html
He proposes to create the automatic indexes in a neo4j start script, for instance. I also saw that someone already wished for unique indexes (would be a great feature!). That would simplify the index creation but in the end this is now a part of the database setup, it seems.

How to reindex only some objects in Sunspot Solr

We use Sunspot Solr for indexing and searching in our Ruby on Rails application.
We wanted to reindex some objects and someone accidentally ran the Product.reindex command from the Rails Console. The result was that indexing of all products started from scratch and our catalogue appeared empty while indexing was taking place.
Since we have a vast amount of data the reindexing has been taken three days so far. This morning when I checked on the progress of the reindexing, it seems like there was one corrupt data entry which resulted in the reindexing stopping without completing.
I cannot restart the entire Product.reindex operation again as it takes way too long. Is there a way to only run reindexing on selected products? I want to select a range of products that aren't indexed and then just run indexing on thise. How can I add a single product to the index without having to run a complete reindex of entire data set?
Sunspot does index an object in the save callback so you could save each object but maybe that would trigger other callbacks too. A more precise way to do it would be
Sunspot.index [post1, post2]
Sunspot.commit
or with autocommit
Sunspot.index! [post1, post2]
You could even pass in object relations as they are just an array too
Sunspot.index! post1.comments
I have found the answer on https://github.com/sunspot/sunspot#reindexing-objects
Whenever an object is saved, it is automatically reindexed as part of the save callbacks. So all that was needed was to add all the objects that needed reindexing to an array and then loop through the array, calling save on each object. This successfully updated the required objects in the index.

RoR: making a version control function in side my application?

What would be the rails way of implementing version control in my record management application?
My system allows users to manage Records and i want to allow them to view historical versions of a Record. i know instead of updating a Record I will now create a new instance of the Record and related models every time a user "updates" a record(each Record has_many Categories and Advantages). how would i ensure that different versions of the Record are all linked together (i.e the new updated record created to be associated as the new version of record A, so when i click "show me a list of all versions of record A").
this is all theoretical thinking as i am yet to start coding, if i missed anything which i should also consider please let me know.
Thank You
create a new instance of the record every-time a user updates it, have a secondary ID as you mentioned to group all different versions of the same record together and then run a check in the controller (using some sort of hidden value) to see if you want to save over the record or create a new one.
You can then retrive the latest version of each record by finding the most recently updated/created record with a unique secondary_id.
A good starting point may be the vestal versions gem, that keeps an history of modified records
Here are 2 of my insights:
1st simple one :
You save a record with a higher id , when you read it you take the higher id. If you want to know the past do not filter on ids.
2nd (parts from SAP) :
Your record has 2 supplemental fields , that is startTime, stopTime. Thoses are the time the record start entering in action and stops being in action. Inserting a new record , you update the stopTime of the last one, put now as the startTime of the new one, and the end of the world for the stopTime of the new one

Resources