Elasticsearch versioning through Tire gem - ruby-on-rails

I have an existing Rails app that uses tire (0.4.0) to interface with an Elasticsearch (0.17.4) engine. It already has a couple of models using Tire::Persistence. I want to add a new model that takes advantage of Elasticsearch versioning, to track all changes and be able to revert to previous versions.
Right now when I retrieve any 'persisted' model instance, I check _version and it is always nil. I have not found any tire documentation that relates to versioning. Do I have to activate it somehow, or manually save records with version values? Am I even on the right track here?
I do see that certain methods return _version values for items, but others don't...
Article.first._version # => nil
Article.search("sample query").first._version # => nil
Article.find("id_123")._version # => 8
Also, versioning seems to increment by 2. Perhaps tire is not fully equipped to deal with versioning. Is it saving previous versions? How can I retrieve a previous version of a record?
[EDIT] I may have misunderstood what 'versioning' actually is in Elasticsearch. Seems like it's mostly for concurrency control. Oh wellz. (I would love to hear otherwise, though)

First of all, yes, as you write in the Edit, versions in ElasticSearch are not meant to store revisions of the document, but for concurrency control (eg. not overwriting a document with stale version).
Second, you have to declare that you want the versions returned back from search; http://www.elasticsearch.org/guide/reference/api/search/version.html
This code shows you how to do it in Tire.
require 'tire'
Tire.index('articles') do
delete
create
store id: 1, title: 'One'
store id: 2, title: 'Two'
store id: 2, title: 'Two again'
refresh
end
articles = Tire.search('articles') do
query { all }
version true
end.results
articles.each do |article|
puts "Article '#{article.title}' has version #{article._version}"
end
For the moment, it's best to read the Tire integration test suite for documentation. The documentation is of course something which should and will improve.
As for your original requirement itself, similar questions have regularly popped up for CouchDB, in the past. The Simple Document Versioning with CouchDB blog post describes one nice strategy. It would be beneficial for you to research the CouchDB solutions, since the document model is similar. (Of course, CouchDB, contrary to ElasticSearch does physically store the document revisions, thus it opens a different range of strategies.)
In ElasticSearch, your basic decision regarding working with revisions would be:
Do I want to store full revisions directly in the JSON itself?
This could make sense for smaller documents and smaller databases. Depending on your requirements, it could make searching for historic documents very simple.
The Nested Type in ElasticSearch would make working with these “revisions as nested documents” convenient and easy.
(You could also store just “diffs” of the documents in the JSON itself, but that would put more strain on your application logic, I bet.)
Do I want to store revisions as separate documents?
You may want to store revisions separately, and “link” them with their base document. The parent/child support in ElasticSearch would make it possible to work with those relationships and queries.

Related

Creating static permanent hash in rails [duplicate]

If we have a small table which contains relatively static data, is it possible to have Active Record load this in on startup of the app and never have to hit the database for this data?
Note, that ideally I would like this data to be join-able from other Models which have relationships to it.
An example might be a list of countries with their telephone number prefix - this list is unlikely to change, and if it did it would be changed by an admin. Other tables might have relationships with this (eg. given a User who has a reference to the country, we might want to lookup the country telephone prefix).
I saw a similar question here, but it's 6 years old and refers to Rails 2, while I am using Rails 5 and maybe something has been introduced since then.
Preferred solutions would be:
Built-in Rails / ActiveRecord functionality to load a table once on startup and if other records are subsequently loaded in which have relationships with the cached table, then link to the cached objects automatically (ie. manually caching MyModel.all somewhere is not sufficient, as relationships would still be loaded by querying the database).
Maintained library which does the above.
If neither are available, I suppose an alternative method would be to define the static dataset as an in-memory enum/hash or similar, and persist the hash key on records which have a relationship to this data, and define methods on those Models to lookup using the object in the hash using the key persisted in the database. This seems quite manual though...
[EDIT]
One other thing to consider with potential solutions - the manual solution (3) would also require custom controllers and routes for such data to be accessible over an API. Ideally it would be nice to have a solution where such data could be offered up via a RESTful API (read only - just GET) if desired using standard rails mechanisms like Scaffolding without too much manual intervention.
I think you may be discounting the "easy" / "manual" approach too quickly.
Writing the data to a ruby hash / array isn't that bad an idea.
And if you want to use a CRUD scaffold, why not just use the standard Rails model / controller generator? Is it really so bad to store some static data in the database?
A third option would be to store your data to a file in some serialized format and then when your app loads read this and construct ActiveRecord objects. Let me show an example:
data.yml
---
- a: "1"
b: "1"
- a: "2"
b: "2"
This is a YAML file containing an array of hashes; you can construct such a file with:
require 'yaml'
File.open("path.yml", "w") do |f|
data = [
{ "a" => "1", "b" => 1 },
{ "a" => "2", "b" => 2 }
]
f.write(YAML.dump(data))
end
Then to load the data, you might create a file in config/initializers/ (everything here will be autoloaded by rails):
config/initializers/static_data.rb
require 'yaml'
# define a constant that can be used by the rest of the app
StaticData = YAML.load(File.read("data.yml")).map do |object|
MyObjectClass.new(object)
end
To avoid having to write database migrations for MyObjectClass (when it's not actually being stored in the db) you can use attr_accessor definitions for your attributes:
class MyObjectClass < ActiveRecord::Base
# say these are your two columns
attr_accessor :a, :b
end
just make sure not to run stuff like save, delete, or update on this model (unless you monkeypatch these methods).
If you want to have REST / CRUD endpoints, you'd need to write them from scratch because the way to change data is different now.
You'd basically need to do any update in a 3 step process:
load the data from YAML into a Ruby object list
change the Ruby object list
serialize everything to YAML and save it.
So you can see you're not really doing incremental updates here. You could use JSON instead of YAML and you'd have the same problem. With Ruby's built in storage system PStore you would be able to update objects on an individual basis, but using SQL for a production web app is a much better idea and will honestly make things more simple.
Moving beyond these "serialized data" options there are key-val storage servers store data in memory. Stuff like Memcached and Redis.
But to go back to my earlier point, unless you have a good reason not to use SQL you're only making things more difficult.
It sounds like FrozenRecord would be a good match for what you are looking for.
Active Record-like interface for read only access to static data files of reasonable size.

Rails 5 Active Record - is it possible to keep a table in memory?

If we have a small table which contains relatively static data, is it possible to have Active Record load this in on startup of the app and never have to hit the database for this data?
Note, that ideally I would like this data to be join-able from other Models which have relationships to it.
An example might be a list of countries with their telephone number prefix - this list is unlikely to change, and if it did it would be changed by an admin. Other tables might have relationships with this (eg. given a User who has a reference to the country, we might want to lookup the country telephone prefix).
I saw a similar question here, but it's 6 years old and refers to Rails 2, while I am using Rails 5 and maybe something has been introduced since then.
Preferred solutions would be:
Built-in Rails / ActiveRecord functionality to load a table once on startup and if other records are subsequently loaded in which have relationships with the cached table, then link to the cached objects automatically (ie. manually caching MyModel.all somewhere is not sufficient, as relationships would still be loaded by querying the database).
Maintained library which does the above.
If neither are available, I suppose an alternative method would be to define the static dataset as an in-memory enum/hash or similar, and persist the hash key on records which have a relationship to this data, and define methods on those Models to lookup using the object in the hash using the key persisted in the database. This seems quite manual though...
[EDIT]
One other thing to consider with potential solutions - the manual solution (3) would also require custom controllers and routes for such data to be accessible over an API. Ideally it would be nice to have a solution where such data could be offered up via a RESTful API (read only - just GET) if desired using standard rails mechanisms like Scaffolding without too much manual intervention.
I think you may be discounting the "easy" / "manual" approach too quickly.
Writing the data to a ruby hash / array isn't that bad an idea.
And if you want to use a CRUD scaffold, why not just use the standard Rails model / controller generator? Is it really so bad to store some static data in the database?
A third option would be to store your data to a file in some serialized format and then when your app loads read this and construct ActiveRecord objects. Let me show an example:
data.yml
---
- a: "1"
b: "1"
- a: "2"
b: "2"
This is a YAML file containing an array of hashes; you can construct such a file with:
require 'yaml'
File.open("path.yml", "w") do |f|
data = [
{ "a" => "1", "b" => 1 },
{ "a" => "2", "b" => 2 }
]
f.write(YAML.dump(data))
end
Then to load the data, you might create a file in config/initializers/ (everything here will be autoloaded by rails):
config/initializers/static_data.rb
require 'yaml'
# define a constant that can be used by the rest of the app
StaticData = YAML.load(File.read("data.yml")).map do |object|
MyObjectClass.new(object)
end
To avoid having to write database migrations for MyObjectClass (when it's not actually being stored in the db) you can use attr_accessor definitions for your attributes:
class MyObjectClass < ActiveRecord::Base
# say these are your two columns
attr_accessor :a, :b
end
just make sure not to run stuff like save, delete, or update on this model (unless you monkeypatch these methods).
If you want to have REST / CRUD endpoints, you'd need to write them from scratch because the way to change data is different now.
You'd basically need to do any update in a 3 step process:
load the data from YAML into a Ruby object list
change the Ruby object list
serialize everything to YAML and save it.
So you can see you're not really doing incremental updates here. You could use JSON instead of YAML and you'd have the same problem. With Ruby's built in storage system PStore you would be able to update objects on an individual basis, but using SQL for a production web app is a much better idea and will honestly make things more simple.
Moving beyond these "serialized data" options there are key-val storage servers store data in memory. Stuff like Memcached and Redis.
But to go back to my earlier point, unless you have a good reason not to use SQL you're only making things more difficult.
It sounds like FrozenRecord would be a good match for what you are looking for.
Active Record-like interface for read only access to static data files of reasonable size.

Best practices regarding per-user settings and predefining options

I want to save settings for my users and some of them would be one out of a predefined list! Using https://github.com/ledermann/rails-settings ATM.
The setting for f.e. weight_unit would be out of [:kg, :lb].
I don't really want to hardcode that stuff into controller or view code.
It's kind of a common functionality, so I was wondering: Did anyone come up with some way of abstracting that business into class constants or the database in a DRY fashion?
Usually, when I have to store some not important information which I don't care to query individually, I store them on a serialized column.
In your case you could create a new column in your users table (for example call it "settings").
After that you add to user model
serialize :settings, Hash
from this moment you can put whatever you like into settings, for example
user.settings = {:weight_unit => :kg, :other_setting1 => 'foo', :other_setting2 => 'bar'}
and saving with user.save you will get, in settings column, the serialized data.
Rails does also de-serialize it so after fetching a user's record, calling user.settings, you will get all saved settings for the user.
To get more information on serialize() refer to docs: http://api.rubyonrails.org/classes/ActiveRecord/AttributeMethods/Serialization/ClassMethods.html#method-i-serialize
UPDATE1
To ensure that settings are in the predefined list you can use validations on your user model.
UPDATE2
Usually, if there are some pre-defined values it's a good habit to store them in a constant inside the related model, in this way you have access to them from model (inside and outside). Acceptable values does not change by instance so it makes sense to share them between all. An example which is more valuable than any word. Defining in your User model:
ALLOWED_SETTINGS = {:weight_unit => [:kg, :lb],
:eyes_color => [:green, :blue, :brows, :black],
:hair_length => [:short, :long]}
you can use it BOTH
outside the model itself, doing
User::ALLOWED_SETTINGS
inside your model (in validations, instance methods or wherever you want) using:
ALLOWED_SETTINGS
Based on your question, it sounds like these are more configuration options that a particular user will choose from that may be quite static, rather than dynamic in nature in that the options can change over time. For example, I doubt you'll be adding various other weight_units other than :kg and :lb, but it's possible I'm misreading your question.
If I am reading this correctly, I would recommend (and have used) a yml file in the config/ directory for values such as this. The yml file is accessible app wide and all your "settings" could live in one file. These could then be loaded into your models as constants, and serialized as #SDp suggests. However, I tend to err on the side of caution, especially when thinking that perhaps these "common values" may want to be queried some day, so I would prefer to have each of these as a column on a table rather than a single serialized value. The overhead isn't that much more, and you would gain a lot of additional built-in benefits from Rails having them be individual columns.
That said, I have personally used hstore with Postgres with great success, doing just what you are describing. However, the reason I chose to use an hstore over individual columns was because I was storing multiple different demographics, in which all of the demographics could change over time (e.g. some keys could be added, and more importantly, some keys could be removed.) It sounds like in your case it's highly unlikely you'll be removing keys as these are basic traits, but again, I could be wrong.
TL;DR - I feel that unless you have a compelling reason (such as regularly adding and/or removing keys/settings), these should be individual columns on a database table. If you strongly feel these should be stored in the database serialized, and you're using Postgres, check out hstore.
If you are using PostgreSQL, I think you can watch to HStore with Rails 4 + this gem https://github.com/devmynd/hstore_accessor

Versioning for Rails i18n translations

I'm in the process of building a volunteer based translation engine for a new site built in Rails 4.0. Since it's volunteer based, there is always the possibility that a user may enter a translation that others do not agree with, accidentally remove a translation, etc. In such an event, I would like to give users the option to revert to a previous translation.
I did some searching around but have yet to find a solution aside from writing my own I18n backend. Is there a simpler way of storing previous versions of translations?
I'm currently using Sven Fuchs' Active Record as a backend, however I'm seriously thinking about switching due to possible performance issues later on down the road.
We had a very successful experience using Globalize (github page: https://github.com/globalize/globalize) and as for the versioning part we didn't try it but Globalize does have support for that in a seperate gem github page: (https://github.com/globalize/globalize-versioning)
After tens of painful gem experiences, i found that comparing gems by last update date and how frequent is new releases, bugs fixes and support is a major factor to decide which one will make your life easier and which one won't.
Update:
You can use Globalize to dynamically translate views (check tutorial) but i came across a github project called iye. I think it suits your needs best (github page: https://github.com/firmafon/iye)
I used Nimir's help to find this solution. Like Globalize Versioning, you can add Paper Trail support for the Active Record's Translation class, however there are a few pitfalls to this method currently.
First you need to include the gems in your Gemfile:
gem "i18n-active_record"
gem "paper_trail"
Then you need to make sure your Translation model class is inheriting from I18n Active Record::Translation and call and calls has_paper_trail:
class Translation < I18n::Backend::ActiveRecord::Translation
has_paper_trail
end
This should really be enough, however the store_translations method in I18n Active Record do not update existing records. Instead, each time a record is added, all records with the given key are deleted and a new record is created. This causes confusion for Paper Trail since it relied on an id.
To get around this issue, I created my own store_translation method, which will update records if they exist:
def store_translations(locale, data, options = {})
escape = options.fetch(:escape, true)
I18n.backend.flatten_translations(locale, data, escape, false).each do |key, value|
t = Translation.find_or_create_by!(locale: locale.to_s, key: key.to_s)
t.value = value
t.save
end
I18n.backend.reload!
end
Notice I also included I18n.backend.reload!, this is because I am running Memoize to cache translations but it seems that it needs to be told to recache whenever a record is updated.
Now I can simply call:
store_translations(lang, {key => key}, :escape => false)
to a store a new translation and ensure we keep a record of the old one and who made the change.

Cloning documents within a Mongo database

I have a mongodb database that I use mongoid to access via a rails 3 application. The database consists of around 10-15 collections. Some of the documents in these collections have embedded documents and other documents are linked by id.
I need to clone most of the data in the database to create new records. These new records will need to co-exist with their cloned counterparts while they are translated by our client. These new records must maintain the same relationships as they did before however the newly cloned records need to point to their newly cloned counterparts.
Considerations include: A number of has one relationships that have a "foreign key" that will need to be updated on clone. Some documents have embedded documents that will need to be cloned with their parents. Clonee documents will not be able to relate to their cloned documents in anyway.
Solutions Considered: The first option was to duplicate the database and try and merge everything that does not need to be cloned. Might be a little messy and I am assuming that existing ID would get cloned too. The second option I considered was to write a script that would iterate though each Mongoid document class and called clone however I found out that monogid.clone does a shallow copy not a deep drudge. So for this solution I would have to write a case in which embedded relationships where detected in order to perform a deep copy. This also could get messy.
Is there an option I have not considered here? Is there a better way to go about one of the considered solutions? Am I up against it?
Watching the discussion in the comments, I'd say that if .clone does not work, you can easily do that in a compact way with the attributes, read_attribute, write_attribute methods. Excerpt from here.
# Get the field values as a hash.
person.attributes
# Set the field values in the document.
Person.new(first_name: "Jean-Baptiste", middle_name: "Emmanuel")
person.attributes = { first_name: "Jean-Baptiste", middle_name: "Emmanuel" }
person.write_attributes(
first_name: "Jean-Baptiste",
middle_name: "Emmanuel"
)

Resources