Cloning documents within a Mongo database - ruby-on-rails

I have a mongodb database that I use mongoid to access via a rails 3 application. The database consists of around 10-15 collections. Some of the documents in these collections have embedded documents and other documents are linked by id.
I need to clone most of the data in the database to create new records. These new records will need to co-exist with their cloned counterparts while they are translated by our client. These new records must maintain the same relationships as they did before however the newly cloned records need to point to their newly cloned counterparts.
Considerations include: A number of has one relationships that have a "foreign key" that will need to be updated on clone. Some documents have embedded documents that will need to be cloned with their parents. Clonee documents will not be able to relate to their cloned documents in anyway.
Solutions Considered: The first option was to duplicate the database and try and merge everything that does not need to be cloned. Might be a little messy and I am assuming that existing ID would get cloned too. The second option I considered was to write a script that would iterate though each Mongoid document class and called clone however I found out that monogid.clone does a shallow copy not a deep drudge. So for this solution I would have to write a case in which embedded relationships where detected in order to perform a deep copy. This also could get messy.
Is there an option I have not considered here? Is there a better way to go about one of the considered solutions? Am I up against it?

Watching the discussion in the comments, I'd say that if .clone does not work, you can easily do that in a compact way with the attributes, read_attribute, write_attribute methods. Excerpt from here.
# Get the field values as a hash.
person.attributes
# Set the field values in the document.
Person.new(first_name: "Jean-Baptiste", middle_name: "Emmanuel")
person.attributes = { first_name: "Jean-Baptiste", middle_name: "Emmanuel" }
person.write_attributes(
first_name: "Jean-Baptiste",
middle_name: "Emmanuel"
)

Related

Creating static permanent hash in rails [duplicate]

If we have a small table which contains relatively static data, is it possible to have Active Record load this in on startup of the app and never have to hit the database for this data?
Note, that ideally I would like this data to be join-able from other Models which have relationships to it.
An example might be a list of countries with their telephone number prefix - this list is unlikely to change, and if it did it would be changed by an admin. Other tables might have relationships with this (eg. given a User who has a reference to the country, we might want to lookup the country telephone prefix).
I saw a similar question here, but it's 6 years old and refers to Rails 2, while I am using Rails 5 and maybe something has been introduced since then.
Preferred solutions would be:
Built-in Rails / ActiveRecord functionality to load a table once on startup and if other records are subsequently loaded in which have relationships with the cached table, then link to the cached objects automatically (ie. manually caching MyModel.all somewhere is not sufficient, as relationships would still be loaded by querying the database).
Maintained library which does the above.
If neither are available, I suppose an alternative method would be to define the static dataset as an in-memory enum/hash or similar, and persist the hash key on records which have a relationship to this data, and define methods on those Models to lookup using the object in the hash using the key persisted in the database. This seems quite manual though...
[EDIT]
One other thing to consider with potential solutions - the manual solution (3) would also require custom controllers and routes for such data to be accessible over an API. Ideally it would be nice to have a solution where such data could be offered up via a RESTful API (read only - just GET) if desired using standard rails mechanisms like Scaffolding without too much manual intervention.
I think you may be discounting the "easy" / "manual" approach too quickly.
Writing the data to a ruby hash / array isn't that bad an idea.
And if you want to use a CRUD scaffold, why not just use the standard Rails model / controller generator? Is it really so bad to store some static data in the database?
A third option would be to store your data to a file in some serialized format and then when your app loads read this and construct ActiveRecord objects. Let me show an example:
data.yml
---
- a: "1"
b: "1"
- a: "2"
b: "2"
This is a YAML file containing an array of hashes; you can construct such a file with:
require 'yaml'
File.open("path.yml", "w") do |f|
data = [
{ "a" => "1", "b" => 1 },
{ "a" => "2", "b" => 2 }
]
f.write(YAML.dump(data))
end
Then to load the data, you might create a file in config/initializers/ (everything here will be autoloaded by rails):
config/initializers/static_data.rb
require 'yaml'
# define a constant that can be used by the rest of the app
StaticData = YAML.load(File.read("data.yml")).map do |object|
MyObjectClass.new(object)
end
To avoid having to write database migrations for MyObjectClass (when it's not actually being stored in the db) you can use attr_accessor definitions for your attributes:
class MyObjectClass < ActiveRecord::Base
# say these are your two columns
attr_accessor :a, :b
end
just make sure not to run stuff like save, delete, or update on this model (unless you monkeypatch these methods).
If you want to have REST / CRUD endpoints, you'd need to write them from scratch because the way to change data is different now.
You'd basically need to do any update in a 3 step process:
load the data from YAML into a Ruby object list
change the Ruby object list
serialize everything to YAML and save it.
So you can see you're not really doing incremental updates here. You could use JSON instead of YAML and you'd have the same problem. With Ruby's built in storage system PStore you would be able to update objects on an individual basis, but using SQL for a production web app is a much better idea and will honestly make things more simple.
Moving beyond these "serialized data" options there are key-val storage servers store data in memory. Stuff like Memcached and Redis.
But to go back to my earlier point, unless you have a good reason not to use SQL you're only making things more difficult.
It sounds like FrozenRecord would be a good match for what you are looking for.
Active Record-like interface for read only access to static data files of reasonable size.

Rails 5 Active Record - is it possible to keep a table in memory?

If we have a small table which contains relatively static data, is it possible to have Active Record load this in on startup of the app and never have to hit the database for this data?
Note, that ideally I would like this data to be join-able from other Models which have relationships to it.
An example might be a list of countries with their telephone number prefix - this list is unlikely to change, and if it did it would be changed by an admin. Other tables might have relationships with this (eg. given a User who has a reference to the country, we might want to lookup the country telephone prefix).
I saw a similar question here, but it's 6 years old and refers to Rails 2, while I am using Rails 5 and maybe something has been introduced since then.
Preferred solutions would be:
Built-in Rails / ActiveRecord functionality to load a table once on startup and if other records are subsequently loaded in which have relationships with the cached table, then link to the cached objects automatically (ie. manually caching MyModel.all somewhere is not sufficient, as relationships would still be loaded by querying the database).
Maintained library which does the above.
If neither are available, I suppose an alternative method would be to define the static dataset as an in-memory enum/hash or similar, and persist the hash key on records which have a relationship to this data, and define methods on those Models to lookup using the object in the hash using the key persisted in the database. This seems quite manual though...
[EDIT]
One other thing to consider with potential solutions - the manual solution (3) would also require custom controllers and routes for such data to be accessible over an API. Ideally it would be nice to have a solution where such data could be offered up via a RESTful API (read only - just GET) if desired using standard rails mechanisms like Scaffolding without too much manual intervention.
I think you may be discounting the "easy" / "manual" approach too quickly.
Writing the data to a ruby hash / array isn't that bad an idea.
And if you want to use a CRUD scaffold, why not just use the standard Rails model / controller generator? Is it really so bad to store some static data in the database?
A third option would be to store your data to a file in some serialized format and then when your app loads read this and construct ActiveRecord objects. Let me show an example:
data.yml
---
- a: "1"
b: "1"
- a: "2"
b: "2"
This is a YAML file containing an array of hashes; you can construct such a file with:
require 'yaml'
File.open("path.yml", "w") do |f|
data = [
{ "a" => "1", "b" => 1 },
{ "a" => "2", "b" => 2 }
]
f.write(YAML.dump(data))
end
Then to load the data, you might create a file in config/initializers/ (everything here will be autoloaded by rails):
config/initializers/static_data.rb
require 'yaml'
# define a constant that can be used by the rest of the app
StaticData = YAML.load(File.read("data.yml")).map do |object|
MyObjectClass.new(object)
end
To avoid having to write database migrations for MyObjectClass (when it's not actually being stored in the db) you can use attr_accessor definitions for your attributes:
class MyObjectClass < ActiveRecord::Base
# say these are your two columns
attr_accessor :a, :b
end
just make sure not to run stuff like save, delete, or update on this model (unless you monkeypatch these methods).
If you want to have REST / CRUD endpoints, you'd need to write them from scratch because the way to change data is different now.
You'd basically need to do any update in a 3 step process:
load the data from YAML into a Ruby object list
change the Ruby object list
serialize everything to YAML and save it.
So you can see you're not really doing incremental updates here. You could use JSON instead of YAML and you'd have the same problem. With Ruby's built in storage system PStore you would be able to update objects on an individual basis, but using SQL for a production web app is a much better idea and will honestly make things more simple.
Moving beyond these "serialized data" options there are key-val storage servers store data in memory. Stuff like Memcached and Redis.
But to go back to my earlier point, unless you have a good reason not to use SQL you're only making things more difficult.
It sounds like FrozenRecord would be a good match for what you are looking for.
Active Record-like interface for read only access to static data files of reasonable size.

Mongoid and Postgres Scaffolding/Relationships

I have a need for a certain model to contain a reference to a document. Most of the model could be stored in postgres. The model is for a "level" in a game. I'd like to store the level data itself inside of a document, which makes more sense than making a complex tree in sql.
I am able to use postgres with mongoid installed; however, after installing the mongoid gem I seem to only be able to scaffold mongoid (non active record) documents.
The problem is that I have references to other tables, and I don't neccesarily know how to link that up within a mongoid model.
Questions:
How can I force scaffolding to occur with active record instead of mongoid or vice versa. Edit: partly answered here: Using Active Record generators after Mongoid installation? (2nd answer works, but I don't know how to go back and forth easily)
Is there an easy way to reference a document from an active record model (I know the documentation said don't mix them, but it is ideal for what I am trying to do).
If it is not possible to mix them, then how should I make a document be referenced from a postgres/active record table. In other words how can I get both pieces of data at the same time?
Thanks!
Regarding your first question, the ideal solution would be something along the lines of the first answer in the referenced post. However, instead of a generating a migration, generate a model instead. So when you want an Active Record model simply run:
rails g active_record:model
As for your second and third questions, to associate an Active Record model with a Mongoid document simply store the ObjectId as a string in the model. Then, when you get retrieve a record make a new ObjectId out of the string and use that to query for the related document.
You can create object ids out of the strings like this:
BSON::ObjectId.from_string("object_id_string")
There isn't really an easy way to easily follow intra-orm relations when mixing and matching between ActiveRecord and Mongoid though so I'm afraid that will have to be done via Ruby code.
The models you define in rails either extend one ORM's base class or the other and they don't know about one another. There may be projects out there that act as a layer on top of these ORMs but I am not familiar with any that exist at the moment.

Elasticsearch versioning through Tire gem

I have an existing Rails app that uses tire (0.4.0) to interface with an Elasticsearch (0.17.4) engine. It already has a couple of models using Tire::Persistence. I want to add a new model that takes advantage of Elasticsearch versioning, to track all changes and be able to revert to previous versions.
Right now when I retrieve any 'persisted' model instance, I check _version and it is always nil. I have not found any tire documentation that relates to versioning. Do I have to activate it somehow, or manually save records with version values? Am I even on the right track here?
I do see that certain methods return _version values for items, but others don't...
Article.first._version # => nil
Article.search("sample query").first._version # => nil
Article.find("id_123")._version # => 8
Also, versioning seems to increment by 2. Perhaps tire is not fully equipped to deal with versioning. Is it saving previous versions? How can I retrieve a previous version of a record?
[EDIT] I may have misunderstood what 'versioning' actually is in Elasticsearch. Seems like it's mostly for concurrency control. Oh wellz. (I would love to hear otherwise, though)
First of all, yes, as you write in the Edit, versions in ElasticSearch are not meant to store revisions of the document, but for concurrency control (eg. not overwriting a document with stale version).
Second, you have to declare that you want the versions returned back from search; http://www.elasticsearch.org/guide/reference/api/search/version.html
This code shows you how to do it in Tire.
require 'tire'
Tire.index('articles') do
delete
create
store id: 1, title: 'One'
store id: 2, title: 'Two'
store id: 2, title: 'Two again'
refresh
end
articles = Tire.search('articles') do
query { all }
version true
end.results
articles.each do |article|
puts "Article '#{article.title}' has version #{article._version}"
end
For the moment, it's best to read the Tire integration test suite for documentation. The documentation is of course something which should and will improve.
As for your original requirement itself, similar questions have regularly popped up for CouchDB, in the past. The Simple Document Versioning with CouchDB blog post describes one nice strategy. It would be beneficial for you to research the CouchDB solutions, since the document model is similar. (Of course, CouchDB, contrary to ElasticSearch does physically store the document revisions, thus it opens a different range of strategies.)
In ElasticSearch, your basic decision regarding working with revisions would be:
Do I want to store full revisions directly in the JSON itself?
This could make sense for smaller documents and smaller databases. Depending on your requirements, it could make searching for historic documents very simple.
The Nested Type in ElasticSearch would make working with these “revisions as nested documents” convenient and easy.
(You could also store just “diffs” of the documents in the JSON itself, but that would put more strain on your application logic, I bet.)
Do I want to store revisions as separate documents?
You may want to store revisions separately, and “link” them with their base document. The parent/child support in ElasticSearch would make it possible to work with those relationships and queries.

Implement dynamic data model with MongoDB in Rails

I'm creating an application consisting of a bunch of entries. These entries are going to have a bunch of fields (e.g. category, name, description etc.) and be of a certain type (category). So the user would first create a category with a title and description and then define what other fields an entry in that category can and should have.
Example:
Create category, title => 'Books', description => 'A description'. Defining extra fields, author (required), image (not required).
Create entry, when choosing category => 'Books' the form is regenerated and the fields for author and image are shown with validation defined in the category.
I hope somebody understands..
I was talking to a friend about this who recommended going for MongoDB in order to implement this, now I have an app installed with Mongoid and everything works just fine.
The question is, how would I implement this in the best way, making it as flexible as possible?
it's hard to answer to your because it is quite vague… here is what I can say about MongoDB:
MongoDB is already as flexible as possible (that is even its problem actually).
The problem is more likely to sometime restricts its flexibility i.e. check access rights, check that your jSON you are storing is in the right scheme and so on.
If your db is not too huge and you do not want to bother with many collections, you can store all your Books items (documents) (or even a document containing lists) into the same collection.

Resources