Approach to evolve data schemas with Migrations - ruby-on-rails

I come from a NodeJS background, where most of the frameworks don't include migrations support. I have a few doubts that I hope you'll be able to clarify:
Supose I defined a data schema for a product with a name and a price:
If I create a migration and add a required column called description and run db:migrate, future products will require a description but what about the older ones?
They will not be valid, or contain an empty description?
Do I have to manually add descriptions to them?
What if I set an optional value for the description? Will that be applied to older instances?
If I reset all the migrations and run them again, will I lose any data?
What is the correct approach to handle this kind of situations where you evolve your schema, possibly rendering invalid older instances?

regarding your questions:
If I create a migration and add a required column called description
and run db:migrate, future products will require a description but
what about the older ones?
It depends on what happens in your migration. If you have a required value, you should ensure that old models are migrated as such that they will be valid afterwards too. If not you will run in problems then trying to update old entries because your model validation will deny update attempts (if your update will not include the newly required field)
They will not be valid, or contain an empty description?
As mentioned it depends. If you migrate as such there will be some kind of default value for your new column, you are fine. Otherwise the old entries will just be set to NULL.
Do I have to manually add descriptions to them?
If there is no way to somehow autogenerate your value or default value can't be applied, yes.
What if I set an optional value for the description? Will that be applied to older instances?
If not explicitly defined in your migration your old entries will be set to NULL for the new column.
If I reset all the migrations and run them again, will I lose any data?
Most likely, migrations can be build in the UP and DOWN way but downways you are loosing information which might not be restoreable based on the information loss. So removing all migrations is like dropping the whole database.
But why would someone want to reset all migrations at once? As your database is evolving you should treat it like a growing child, you won't get back the years spent on their education but you can teach them to behave better thus forgetting unnecessary information (dropping unneeded columns with a further UP migration).
What is the correct approach to handle this kind of situations where
you evolve your schema, possibly rendering invalid older instances?
You might blame me but it ... depends ;)
If following update attempts ensure to render the model to be valid, you simply don't have to care. If not, you have to care either within the migration or some kind of scripty or handy attempt to render them back valid again.

Related

iOS CoreData Migration using Custom Mapping Models with multiple historical database versions

I have an app and many history versions of the database.
Our users are typically "once in a year" users, so this means you can never be sure which version of the database their app is running on.
Now in my new version of the database I need to do some custom migration.
The method I use to do this is described in this tutorial: http://9elements.com/io/index.php/customizing-core-data-migrations/
To summarize: I have to make Custom Mapping Models so that I can write my own migration policies for some fields.
Now when I create a Custom Mapping Model, I have to select a Source "xcdatamodel" and a Destination "xcdatamodel" (where "destination" is te new version of my database).
My question is, if I want to do this custom migration from all possible versions, do I need to create multiple Custom Mapping Models, all with a different source, or is there a smarter way to do this?
Or is CoreData smart enough to recognize this?
The short answer is yes; you need to test every migration from every source model to your current destination model. If that migration requires a custom mapping then you will need to have a mapping for that pair.
Core Data does not understand versions; it only understands source and destination. If there is not a way to get from A to B then it will fail. If it can migrate from A to B automatically and you have the option turned on, then it will. Otherwise a heavy (manual) migration is required.
Keep in mind that heavy migrations are VERY labor intensive and I strictly recommend avoiding them. I have found it is far more efficient to export (for example to JSON) and import the data back in then it is to do a heavy migration.
It is enough to have a consistent sequential series of migration models up to the current version. Core Data is "smart" enough to execute the migrations you tell it to migrate in the given order.

change attribute value in Core Data in already published application

(Actually, I don't know how to formulate my question, so in google I found nothing.)
So, the situation: In app in appstore I've the Core Data entity (let's say Weather), one of its attributes is Speed type String. Now it contains single line (e.g. 5 mps), but now I want it to contain an array-like string (e.g. 5 mps; 6.4 mps; ...) also change name from "Speed" to "SpeedHistory".
And I made a new model verison, chose it (it has little checkbox now), renamed the attribute, set "Renaming ID":"Speed" and now: how should I act, to prevent user's of old version data crash?
Could you give me some advice, please?
P.S. Data in Weather Entity is fulfilled by user. And I'm using MagicalRecord.
This is a rather common issue. When you update your model when using core data you have to migrate it. You can follow this tutorial which explains what you should do to fix your issue:
http://www.raywenderlich.com/27657
A lightweight migration is also relatively simple and can be performed safely. You only need to worry when the changes in your model kind of require a change in logic.
Changing the type of a column is something that can't be done with lightweight migrations. If you want to migrate the users' data when they upgrade to your new model version, you'll need to create a mapping model. This process is described in the Mapping Overview section of Apple's Core Data Model Versioning and Data Migration Programming Guide.
I haven't had much success with mapping models, as they seem to be very memory-hungry.
Have you considered adding the SpeedHistory attribute without removing the Speed attribute or setting a renaming identifier? Then in your model class you could override awakeFromFetch:, check whether there's a Speed, and if there is then set the SpeedHistory as appropriate and clear the Speed. You'll migrate the objects one at a time, on an as-needed basis.

Does removing a property from a domain class cause an automatic update to the schema, whereby the corresponding column is dropped?

I'm sort of new at Grails. I've worked with it a bit, but not that much. I'm pretty familiar with Java though. My question is regarding schema updates. I understand that Grails creates Hibernate mappings by looking at the domain classes, and so if I add a new property, Grails will automatically add a column for that property in the database. Does the reverse also hold true? If I remove a property, is that column removed? I'm not seeing that behavior and so I'm wondering if it is a configuration issue.
If I wanted to go into more robust database-management, I'm guessing I will have to use the database-management plugin or something like Liquibase. However, the project I'm working on is pretty simple and for the moment, we haven't decided if we are going in that direction yet.
It depends on your dbCreate setting in DataSource.groovy. If it's create or create-drop then everything gets rebuilt when you restart. If it's update then new tables and columns get added. If it's some other setting then no changes are made.
update doesn't do what most people expect though. It's pessimistic and won't make changes that could result in data loss or corruption. So it won't change the size of a column even if it's wider (e.g. VARCHAR(50) -> VARCHAR(200)). It won't add indexes. It will add a new column that's specified as not-null, but it adds it as nullable since otherwise the previously inserted rows won't be valid. But it won't drop a column or table. So you can easily get into a scenario where you rename a column and end up with two - the old and the new.
Liquibase is a great library and the http://grails.org/plugin/database-migration is popular, so it's easy to get support for both. Once you get past the point in development when your schema stabilizes somewhat you should look into using the plugin.

loading seed data for a rails migration

I have an existing database in which I am converting a formerly 'NULL' column to one that has a default value (and populating that with said default value). However, that value is an ID of a record I need to create. If I put this record in db/seeds.rb, it won't run because db/seeds.rb runs after migrations -- but the migration needs seed data. If I leave the record creation in the migration, then I don't get the record if I make a fresh database with db:load. Is there a better way other than duplicating this in both db/seeds.rb and the migration?
Thanks!
While I can understand your desire to stay DRY and not have to write this in both the migration and seeds.rb, I think you should write it in both places. Not just to make it work, but to accomplish different requirements related to your problem.
You need to ensure that your migration can execute properly regardless of external processes. That means you should put any code required within that specific migration. This isn't to accomplish anything besides making sure your migration executes properly. Suppose someone else tries to migrate without knowing you put part of the code in seeds.rb, it would be very difficult for them to figure out what's going on.
You can make db:load work properly by including similar code in seeds.rb. However, you should be evaluating the current state of your database in seeds.rb due to the fact that it runs after the migrations. So you can check to see if the column exists, and what the default value is etc. This means that if the migration ran and took care of everything, seeds.rb doesn't repeat work or modify values inappropriately. However, if the migration did not set these variables as expected, it is able to set the values.
I'd recommend looking at it as two separate issues so you can be more confident of each one executing successfully independent of one another. It also creates better maintainability for understanding by yourself or others of what's happening in the future.
In my opinion you should treat this in both db/seeds.rb and the migration.
The migration is used to get an existing database from an older version to another version while seeds.rb and schema.rb are used for a fresh database with the latest version.

Considering a new Rails app with existing data (not a db, actual data) -- what is the best way to proceed?

I have been tasked with developing a new retail e-commerce storefront for my current job, and I am considering tackling it with RoR to A) Build a "real" project with my limited Rails knowledge, and B) Give management quick turnaround and feedback (they are wanting to get this done ASAP and their deadlines are rather unrealistic - I'm talking a couple of weeks to go from nothing to working model so they can start to market it with SEO/SEM and, I kid you not, "video blogging" because my boss heard that's the future).
We do have a database structure in place but it's absolutely terrible and was thrown together without rhyme nor reason, so I'm going to largely ignore it and create a new database from scratch; however, I have existing data that I need to load into the application (like I said, it's an e-commerce app and we have the product data). I need to massage this data into a usable format because our supplier provides it to us with cryptic, abbreviated column names and it's highly denormalized, especially in the categories (I've posted a question regarding it before - basically the categories table has six fields, one for each category/subcategory, with some of them being blank if that category doesn't apply).
There are two main issues that are giving me second thoughts:
As I said the data needs to be put into a "proper" database schema; I can't just load it as-is. I have some thoughts as to a good data model for it, but my analysis is not completed yet. There would end up being a large amount of joins tables to link various things together (e.g. products_categories, products_attributes, products_prices) etc and these tables would link products not via an ID but by their SKU (see below).
Everything already has an ID that's generated for it, but anything new I add needs to have one autogenerated; I doubt this will be a problem with any mature RDBMS, but I know Rails likes to generate IDs itself. Also, almost all of the product-related tables are linked by SKUs (and in the data provided by the supplier are actually a composite key consisting of the prefix and stock number, which combined make up the full SKU), not by IDs and I'm not sure if this will be a performance issue (of course, I could always manually create indexes on these columns to speed it up). It does mean that I'll need to break away from the Rails conventions, however.
In short, I think that Rails might be a good choice as far as time-to-market and ease-of-development, but having to work with the existing data content might turn into a pain because the application will need to be developed around that, instead of the "traditional" Rails app, and that factor is giving me major doubts about using Rails. There are also some other issues (having to set up a Linux server, and the fact that the area I live in has very few Rails developers so if I left the company I'd basically be holding them hostage as far as updates/modifications). I'm really unsure as to the best path to proceed.
I would develop the app as if you didn't have the data. Use the ORM and make your database the best it can be, but of course keep in mind what data you have to populate it with (eg: don't make crazy new constraints for things that will leave you going through old data record by record).
When you're done and tested, write an import script that pulls your real data onto your new database.
It's not that different from the conventional design/development model... Apart from you can do your data-input in a semi-automated fashion.
I was in the same situation not too long ago — a crappy PHP app that held ten years worth of all company data.
What I did was simply create a Migration model and added methods to import each resource.
class Migration
def migration_all
self.jobs
end
def self.jobs
...
end
end
The cool thing about this is that you can arrange which order resources are imported as one will likely reference another. I also added methods that directly modified the db schema. One nice trick if you have to keep an existing primary key is to create a field named 'legacy_id', copy over your existing primary key, and when you're done, simply remove the 'id' field, rename the 'legacy_id' field to 'id', then add the primary_key constraint on the new 'id' field.
Don't use the SKU as the unique key for each product - use the standard Rails incremented id.
SKU could change as it may be misentered, etc and that would make it a nightmare to change all of the references from other tables. Put your current id in a sku column, index it and update the references in your other tables to the Rails ids.
You'd be able to do Product.find_by_sku(params[:sku]) in your controllers, set up a /products/:sku route, etc. I don't see what you'd gain (other than a headache) by using your non generated ids as the database primary keys.
I'd also suggest running your old data through your app's validations to make sure you are not loading up a bunch of inconsistencies and erros. It will help your app run smoothly and highlight existing data errors at a point where you can fix them.
Don't assume the existing data is valid just because it is already there.

Resources