MongoMapper and migrations - ruby-on-rails

I'm building a Rails application using MongoDB as the back-end and MongoMapper as the ORM tool. Suppose in version 1, I define the following model:
class SomeModel
include MongoMapper::Document
key :some_key, String
end
Later in version 2, I realize that I need a new required key on the model. So, in version 2, SomeModel now looks like this:
class SomeModel
include MongoMapper::Document
key :some_key, String
key :some_new_key, String, :required => true
end
How do I migrate all my existing data to include some_new_key? Assume that I know how to set a reasonable default value for all the existing documents. Taking this a step further, suppose that in version 3, I realize that I really don't need some_key at all. So, now the model looks like this
class SomeModel
include MongoMapper::Document
key :some_new_key, String, :required => true
end
But all the existing records in my database have values set for some_key, and it's just wasting space at this point. How do I reclaim that space?
With ActiveRecord, I would have just created migrations to add the initial values of some_new_key (in the version1 -> version2 migration) and to delete the values for some_key (in the version2 -> version3 migration).
What's the appropriate way to do this with MongoDB/MongoMapper? It seems to me that some method of tracking which migrations have been run is still necessary. Does such a thing exist?
EDITED: I think people are missing the point of my question. There are times where you want to be able to run a script on a database to change or restructure the data in it. I gave two examples above, one where a new required key was added and one where a key can be removed and space can be reclaimed. How do you manage running these scripts? ActiveRecord migrations give you an easy way to run these scripts and to determine what scripts have already been run and what scripts have not been run. I can obviously write a Mongo script that does any update on the database, but what I'm looking for is a framework like migrations that lets me track which upgrade scripts have already been run.

Check out Mongrations... I just finished reading about it and it looks like what you're after.
http://terrbear.org/?p=249
http://github.com/terrbear/mongrations
Cheers! Kapslok

One option is to use the update operation to update all of your data at once. Multi update is new in the development releases so you'll need to use one of those.

You can try this contraption I just made, but it only works with mongoid and rails 3 (beta 3) at the moment. http://github.com/adacosta/mongoid_rails_migrations . It'll be upgraded to rails 3 when it goes final.

Also another gem for MongoMapper migrations https://github.com/alexeypetrushin/mongo_mapper_ext

Mongrations is a super old gem, completely deprecated. I recommend NOT using it.
Exodus is a really cool migration framework for Mongo, that might be what you want:
https://github.com/ThomasAlxDmy/Exodus

We just build this one: https://github.com/eberhara/mongration - it is a regular node module (you can find it on npm).
We needed a good mongodb migration framework, but could not find any - so we built one.
It has lot's of better features than the regular migration frameworks:
Checksum (issues an error when a previosuly ran migration does not match its old version)
Persists migration state to mongo (there is no regular state file)
Full support to replica sets
Automatic handle rollbacks (developers must specify the rollback procedures)
Ability to run multiple migrations (sync or async) at the same time
Ability to run migrations against different databases at the same time
Hope it helps!

Clint,
You can write code to do updates -- though it seems that for updating a record based on its own fields is not supported.
In such a case, I did the following and ran it against the server:
------------------------------
records = Patient.all()
records.each do |p|
encounters = p.encounters
if encounters.nil? || encounters.empty?
mra = p.updated_at
#puts "\tpatient...#{mra}"
else
mra = encounters.last.created_at
#puts "\tencounter...#{mra}"
end
old = p.most_recent_activity
p.most_recent_activity = mra
p.save!
puts "#{p.last_name} mra: #{old} now: #{mra}"
end
------------------------------

I bet you could hook into Activerecord::Miration to automate and track your "migration" scripts.

MongoDB is a schema-less database. That's why there are no migrations. In the database itself, it doesn't matter whether the objects have the key :some_key or the key :some_other_key at any time.
MongoMapper tries to enforce some restrictions on this, but since the database is so flexible, you will have to maintain those restrictions yourself. If you need a key on every object, make sure you run a script to update those keys on pre-existing objects, or handle the case of an object that doesn't have that key as you come across them.
I am fairly new to MongoDB myself, but as far as I can see, due to the flexibility of the schema-less db this is how you will need to handle it.

Related

Rails migration: only for schema change or also for updating data?

I'm a junior Rails developer and at work we faced the following problem:
Needed to update the value of a column only for one record.
What we did is creating a migration like this:
class DisableAccessForUser < ActiveRecord::Migration
def change
User.where(name: "User").first.update_column(:access, false)
end
end
Are migrations only for schema changes?
What other solutions do you suggest?
PS: I can only change it with code. No access to console.
The short version is, since migrations are only for schema changes, you wouldn't want to use them to change actual data in the database.
The main issue is that your data-manipulating migration(s) might be ignored by other developers if they load the DB structuring using either rake db:schema:load or rake db:reset. Both of which merely load the latest version of the structure using the schema.rb file and do not touch the migrations.
As Nikita Singh also noted in the comments, I too would say the best method of changing row data is to implement a simple rake task that can be run as needed, independent of the migration structure. Or, for a first time installation, the seed.rb file is perfect to load initial system data.
Hope that rambling helps.
Update
Found some documentation in some "official" sources:
Rails Guide for Migrations - Using Models in your Migrations. This section gives a description of a scenario in which data-manipulation in the migration files can cause problems for other developers.
Rails Guide for Migrations - Migrations and Seed Data. Same document as above, doesn't really explain why it is bad to put seed or data manipulation in the migration, merely says to put all that in the seed.rd file.
This SO answer. This person basically says the same thing I wrote above, except they provide a quote from the book Agile Web Development with Rails (3rd edition), partially written by David Heinemeier Hansson, creator of Rails. I won't copy the quote, as you can read it in that post, but I believe it gives you a better idea of why seed or data manipulation in migrations might be considered a bad practice.
Migrations are fine for schema changes. But when you work on much collaborated projects like pulling code everyday from lot of developers.
Chances are you might miss some migrations(Value update migrations..No problem for schema changes) Because migrations depends on the timestamps.
So what we do is create a rake task in a single namespace to update some table values( Be careful it does not overwrites)
And invoke all the rake task in that NameSpace whenever we update the code from Git.
Making data changes using classes in migrations is dangerous because it's not terribly future proof. Changes to the class can easily break the migration in the future.
For example, let's imagine you were to add a new column to user (sample_group) and access that column in a Rails lifecycle callback that executes on object load (e.g. after_initialize). That would break this migration. If you weren't skipping callbacks and validations on save (by using update_column) there'd be even more ways to break this migration going forward.
When I want to make data changes in migrations I typically fall back to SQL. One can execute any SQL statement in a migration by using the execute() method. The exact SQL to use depends on the database in use, but you should be able to come up with a db appropriate query. For example in MySQL I believe the following should work:
execute("UPDATE users SET access = 0 WHERE id IN (select id from users order by id limit 1);")
This is far more future proof.
There is nothing wrong with using a migration to migrate the data in your database, in the right situation, if you do it right.
There are two related things you should avoid in your migrations (as many have mentioned), neither of which preclude migrating data:
It's not safe to use your models in your migrations. The code in the User model might change, and nobody is going to update your migration when that happens, so if some co-worker takes a vacation for 3 months, comes back, and tries to run all the migrations that happened while she was gone, but somebody renamed the User model in the mean time, your migration will be broken, and prevent her from catching up. This just means you have to use SQL, or (if you are determined to keep even your migrations implementation-agnostic) include an independent copy of an ActiveRecord model directly in your migration file (nested under the migration class).
It also doesn't make sense to use migrations for seed data, which is, specifically, data that is to be used to populate a new database when someone sets up the app for the first time so the app will run (or will have the data one would expect in a brand new instance of the app). You can't use migrations for this because you don't run migrations when setting up your database for the first time, you run db:schema:load. Hence the special file for maintaining seed data: seeds.rb. This just means that if you do need to add data in a migration (in order to get production and everyone's dev data up to speed), and it qualifies as seed data (necessary for the app to run), you need to add it to seeds.rb too!
Neither of these, however, mean that you shouldn't use migrations to migrate the data in existing databases. That is what they are for. You should use them!
A migrations is simply a structured way to make database changes, both schema and data.
In my opinion there are situations in which using migrations for data changes is legitimate.
For example:
If you are holding data which is mostly constant in your database but changes annually, it is fine to make a migration each year to update it. For example, if you list the teams in a soccer league a migration would be a good way to update the current teams in each year.
If you want to mass-alter an attribute of a large table. For example if you had a slug column in your user and the name "some user" would be translated to the slug "some_user" and now you want to change it to "some.user". This is something I'd do with a migration.
Having said that, I wouldn't use a migration to change a single user attribute. If this is something which happens occasionally you should make a dashboard which will allow you to edit this data in the future. Otherwise a rake task may be a good option.
This question is old and I think rails approach changed over time here. Based on https://edgeguides.rubyonrails.org/active_record_migrations.html#migrations-and-seed-data it's OK to feed new columns with data here. To be more precise your migration code should contain also "down" block:
class DisableAccessForUser < ActiveRecord::Migration
def up
User.where(name: "User").first.update_column(:access, false)
end
def down
User.where(name: "User").first.update_column(:access, true)
end
end
If you use seeds.rb to pre-fill data, don't forget to include new column value there, too:
User.find_or_create_by(id: 0, name: 'User', access: false)
If I remember correctly, changing particular records may work, but I'm not sure about that.
In any case, it isn't a good practice, migrations should be user for schema changes only.
For updating one record I would use console. Just type 'rails console' in terminal and input code to change attributes.

Schema Migrations Table

In my Rails 4 app I would like to collapse my migration files into one large file (similar to schema.rb) as it's time to do some housekeeping but I'm not sure on how to access the table in the database that stores migration data so that when I run a migration I don't receive any errors/conflicts.
Question How can I access and delete the data in the table that stores migration data?
for fun, you can also manipulate these in the console by making a model class for them...
class SchemaMigration < ActiveRecord::Base; self.primary_key = :version; end
then you can do SchemaMigration.all, SchemaMigration.last.delete, etc.
Really just a substitute for using SQL, and it is very rare that you would need to mess around at this low level… generally a bad idea but cool to see how to do it :)
Another solution could be to access it through:
ActiveRecord::SchemaMigration
The answer given by David didn't work in my context.
The schema_migrations table holds the revision numbers; with the last record being the most recently executed migration. You can just manipulate these records manually.
to get the last version:
ActiveRecord::SchemaMigration.last.version
or all versions:
ActiveRecord::SchemaMigration.all.map(&:version)
Not sure why you want to do this but here you go:
ActiveRecord::Migrator.get_all_versions
I've had to do some cleanup of the sort: accumulation of seemingly trivial migrations create such pollution that things stop making sense.
As a last phase of development (not recommended once in production), you can clear out the schema_migrations table, consolidate your migrations (one-to-one with classes) and create a new table (beware: running migrate has different behaviours, depending on mysql vs postgresql)
#david-lowenfels answer is perfect for this context.
All this, naturally, assumes you haven't made errors in keys, indices, defaults. This is a serious task, but not an insensible one at the end of a development phase.

How to add new seed data to existing rails database

I am working on an application that is already deployed to some test and staging systems and various developers workstations. I need to add some additional reference data but i'm not sure how to add it.
Most of the advice says use seed.rb, however my understanding is that this is only run once, when the application is initially deployed. Since we don't want to rebuild the test and staging databases just so that we can add 1 row of reference data, is there another way to add the data?
I'm thinking of using a db migration, is this the correct approach?
Thanks
Structure your seed.rb file to allow ongoing creation and updating of data. You are not limited to running a seed file only once and if you think it's only used for initial deployment you will miss out on the flexibility it can offer in setting reference data.
A seed file is just ruby so you can do things like:
user = User.find_or_initialize_by(email: 'bob#example.com')
user.name = 'Bob'
user.password = 'secret'
user.role = 'manager'
user.save!
This will create new data if it doesn't exist or update the data if it finds some.
If you structure your seed file correctly you can also create and update dependent objects.
I recommend using the bang save to ensure that exceptions are raised in the event that an object cannot be saved. This is the easiest method of debugging the seed.
I use the seedbank gem to provide more structure to my seed data, including setting data per environment, dependent seeds and more.
I don't recommend using migrations for seed data. There is a lack of flexibility (how do you target seed data to just one environment for instance) and no real way to build up a reusable set of data that can be run at any time to refresh a particular environment. You would also have a set of migrations which have no reference to your schema and you would have to create new migrations every time you wanted to generate new or vary current data.
You can use a migration, but that's not the safest option you have.
Say, for example, you add a record to a table via a migration, then in the future you change that table's schema. When you'll install the app somewhere, you won't be able to run rake db:migrate.
Seeds are always advisable because rake db:seed can be run on a completely migrated schema.
If it's just for a record, go for the rails console.
It's best to use an idempotent method like this in seed.rb or another task called by seed.rb:
Contact.find_by_email("test#example.com") || Contact.create(email: "test#example.com", phone: "202-291-1970", created_by: "System")
# This saves you an update to the DB if the record already exists.
Or similar to #nmott's:
Contact.find_or_initialize_by_email("test#example.com").update_attributes(phone: "202-291-1970", created_by: "System")
# this performs an update regardless, but it may be useful if you want to reset your data.
or use assign_attributes instead of update_attributes if you want to assign multiple attributes before saving.
I use the seed file to add instances to new or existing tables all the time. My solution is simple. I just comment out all the other seed data in the db/seeds.rb file so that only the new seed data is live code. Then run bin/rake db:seed.
I did something like this in seed.rb
users_list = [
{id: 1, name: "Diego", age: "25"},
{id: 2, name: "Elano", age: "27"}
]
while !users_list.empty? do
begin
User.create(users_list)
rescue
users_list = users_list.drop(1) #removing the first if the id already exist.
end
end
If a item in the list with the given id already exist it will return a exception, then we remove that item and try it again, until the users_list array is empty.
This way you don't need to search each object before include it, but you will not be able tho update the values already inserted like in #nmott code.
Instead of altering seeds.db, which you probably want to use for seeding new databases, you can create a custom Rake task (RailsCast #66 Custom Rake Tasks).
You can create as many Rake tasks as you want. For instance, lets say you have two servers, one running version 1.0 of your app, the other one running 1.1, and you want to upgrade both to 1.2. Then you can create lib/tasks/1-0-to-1-2.rake and lib/tasks`1-1-to-1-2.rake since you may need different code depending on the version of your app.

Managing mongoid migrations

Can someone give me a short introduction to doing DB migrations in Rails using Mongoid? I'm particularly interested in lazy per document migrations. By this, I mean that whenever you read a document from the database, you migrate it to its latest version and save it again.
Has anyone done this sort of thing before? I've come across mongoid_rails_migrations, but it doesn't provide any sort of documentation, and although it looks like it does this, I'm not really sure how to use it.
I should point out I'm only conceptually familiar with ActiveRecord migrations.
If you want to do the entire migration at once, then mongoid_rails_migrations will do what you need. There isn't really much to document, it duplicates the functionality of the standard ActiveRecord migration. You write your migrations, and then you use rake db:migrate to apply them and it handles figuring out which ones have and haven't been ran. I can answer further questions if there is something specific you want to know about it.
For lazy migrations, the easiest solution is to use the after_initialize callback. Check if a field matches the old data scheme, and if it does you modify it the object and update it, so for example:
class Person
include Mongoid::Document
after_initialize :migrate_data
field :name, :type => String
def migrate_data
if !self[:first_name].blank? or !self[:last_name].blank?
self.set(:name, "#{self[:first_name]} #{self[:last_name]}".strip)
self.remove_attribute(:first_name)
self.remove_attribute(:last_name)
end
end
end
The tradeoffs to keep in mind with the specific approach I gave above:
If you run a request that returns a lot of records, such as Person.all.each {|p| puts p.name} and 100 people have the old format, it will immediately run 100 set queries. You could also call self.name = "#{self.first_name} #{self.last_name}".strip instead, but that means your data will only be migrated if the record is saved.
General issues you might have is that any mass queries such as Person.where(:name => /Foo/).count will fail until all of the data is migrated. Also if you do Person.only(:name).first the migration would fail because you forgot to include the first_name and last_name fields.
Zachary Anker has explained a lot in his answer.using mongoid_rails_migrations is a good option for migration.
Here are some links with example that will be useful for you to go through and use mongoid_rails_migrations
Mongoid Migrations using the Mongo Driver
Embedding Mongoid documents and data migrations
Other then this the Readme is should be enough with this example to implement mongoid migration
I have the same need.
Here is what I came up with: https://github.com/nviennot/mongoid_lazy_migration
I would gladly appreciate some feedback

Migrations with EF. Where they are stored?

Yesterday i was absolutely sertain that all migrations data for EF placed in classes, placed in my solution as nested from DbMigration. But today i was dig a slightly deeper(just try to fallback to old migration with enable data loss not with nu-get and visual studio, but with code())
DbMigrator fg = new DbMigrator(new Settings() { AutomaticDataLossEnabled = true});
fg.Update("MigrationName");
And get exception, smth like "string should be truncated", those means that migrator tried to update column from big to small MaxLength attribute. So, i had excluded migration that caused this update and move this changes to migration, those create tables. The error still was occured. I got to intellitrace and it said that those(deleted) migration still was called. Looking to requests told me things like this:
SELECT [Extent1].[MigrationId] AS [MigrationId] FROM [dbo].[__MigrationHistory] AS [Extent1]
Looking to a table __MigrationsHistory and get my deleted migration there with field model that contains crypted data(don't decrypt this yet) . I was realy shocked. Does this means that all code, have written in classes is just the fake and really executed code placed here? And does anyone know, how to work with this table, register projections of migration classes to it etc. Or the once way to work with migrations is nu-get console?
I am not fully sure what your primary question is, so I will first try to answer last part about __MigrationHistory table.
Code in classes is not fake, your code in classes is compiled and run.
This table, however, really contains your database model, but it is not encrypted, it is compressed. The reason why Migrations API needs to store your model, is to be able to compare it against your current actual model and track changes for you (for example when you add a new property it will be able to tell what property you added and to perform automatic db migration).
In previous version of EF there was an EdmMetadata table where hash of your model was stored, and EF was able to detect if you made some changes to model by comparing stored and current model hash value. New version when migrations are enabled stores entire model as compressed blob, so it can do diff between the model that was used to create database and current model you are using, and make automatic migrations accordingly.
You should not work directly with this table, it is automatically populated by migrations API, but nuget console is not the only way to do migrations, you can check this resource for some insights how to do it from code.
Now, regarding your question from question title (where they are stored?), migrations are stored in code, in a class inheriting from DbMigration class that migrations API creates for you when you do Add-Migration command in nuget console. When you perform an migration (Update-Database), either from nuget package manager console or from code, API will compare your current model with versions in __MigrationsHistory to find initial version (if you have not specified it) and perform all migrations in between initial and target version (if not specified otherwise target is latest version).
I'm not really clear how you did exclude your migration that causes problems, as you need to migrate your database to version before that migration, and then delete and recreate all subsequent migrations from there.
Maybe you could solve your fallback to old version problem by implementing public override void Down() method in your migration that causes problems when trying to rollback? This method can be used to execute code which performs inverse any operations for migration.
Not directly related to question but worth mentioning, there is also pretty detailed tutorial here for EF CF.

Resources