Rake tasks and migration on Heroku without downtime - ruby-on-rails

Context
I'm using Heroku to serve my rails API (v5.2) with a PostgreSQL database,
Frequently, after some migrations, I have to manually run some specific rake tasks.
Those rake tasks typically delete all the rows of a table before recreating them with different data.
This is problematic for me because it create a downtime for approx. 20 minutes, twice a week (by turning on and off Maintenance mode).
Problem
I would like to avoid downtime between my migrations.
Intended solution
for this, I planned on using Heroku preboot alongside release phase tasks.
After activated preboot for my app, I will put a script in my Procfile
release: ./release-tasks.sh
And in the release-tasks.sh file something like:
heroku run rake my_rake_task --app myApp
Questions
Is it a good/ok solution?
Is it sure that during the migration phase, users will be able to
query the "old" database before the new one is live?
Is there a way to activated release scripts on demand? (e.g using an
env var in Heroku? -- I won't need it for every migrations).

This is a good solution, yes. Release Phase is meant exactly to help running migrations whenever the app is deployed.
This won't prevent downtime in your specific case though. Release phase doesn't start a new database with every release. It just runs a one-off dyno with your command.
Your only solution here is to change your migration strategy to avoid deleting and recreating everything. Depending on what you're doing, you may be able to just update/add/remove the data you need.
Or you could create a new temporary table with the new data, and then delete the old table and rename the new one to its permanent name.
Both those solutions are something you need to write your own code for though.

Related

How do deploy to Heroku with long release tasks

We have a rails v6 app running in Heroku. We have a Postgres DB and we use elastic search. When we push new releases, there are usually two things that need to happen in each release phase:
Run database migrations
Update the elastic search index
Currently, both of these operations happen in a bash script invoked by the Procfile (details below).
The problem is that even though the new code isn't "released" until after the release tasks finish, the database migrations take effect immediately. The DB migrations often introduce breaking changes that cause errors until the corresponding code is released. Normally this wouldn't be a major issue, but Elasticsearch reindexing takes almost two hours to complete (and it needs to happen after migrations). So during that reindexing time, the database has been updated but the code hasn't been released yet. And that's too long for the site to be broken or in maintenance mode.
Database migrations and reindexing are pretty common operations and Heroku is very popular. So my question is what is the correct way to orchestrate this kind of release without multiple hours of downtime?
Uncessful ideas
My first idea was to perform the migrations after the reindexing. But often the migrations modify DB fields that get used during re-indexing. So that won't work.
I also thought about trying to perform the re-indexing on a new/different Elasticsearch index, and then point the app at the newer one when the process completes. I'm not sure how to do this, but it is also problematic because the newly released code often needs the updated index to work properly. So we would still potentially break the site while the reindexing is happening.
Details
Procfile
release: bash ./release-tasks.sh
rails: bundle exec bin/rails s -p 3000
Here's a simplified version of our release-tasks.sh script:
echo "Running migrations on an existing DB...";
rake db:migrate;
# This effects the production DB immediately
echo "Reindexing..."
rake searchkick:reindex:all
# This takes two hours to finish before the code goes live
DB migration should not introduce breaking changes. Your migrations should be "safe", i.e. your pre-deployment code and post-deployment code should both work if the migration ran.
For instance, you should not remove a column from your database unless the pre-deployment code has the column in question in the self.ignored_columns.
Check strong_migrations for more info. The gem page lists the Potentially dangerous operations and provides the safe alternative to run them.

Can I deploy to Heroku without destroying existing database? Ruby on Rails

I have made quite a large upgrade to my app. I have older version deployed on Heroku at the moment. Problem is that I have added/removed quite few migrations in the process of making my app more modular. I do not want to lose my registered user table that is already up on Heroku while deploying the update. Are there any tips someone can offer on how to preserve my user table while upgrading the app? I do have backup add-on installed but I have no clue what to do with that file.
You should then setup your migrations more carefully since that one, which is last presented on the heroku server. So, when:
you are adding a column, just setup defualt value
you remove the column, you shell export data to (for example) YAML, and store it in the tmp/.
you are reconstructing parts of the colunms by adding and remove some of them, in migration carefully copy required data from older column (prepared to deletion) to newly created ones.
If you're that concerned about deploying I think you might need to declare migration bankruptcy.
Create a new migration that drops the current schema and copy the new database schema into it
Dump out the contents of your database as CSV ( https://coderwall.com/p/jwtxjg/simple-export-to-csv-with-postgres ) or using Rails to dump in some other way (YAML etc.)
Write a Rake task to parse the data dump into your new schema
Set up a new database on Heroku Postgres (this does not delete the old data)
Do a dry run locally to make sure it all works
Promote the new database to DATABASE_URL
Deploy, migrate, run the Rake task
Not very nice and not something you can roll back from easily if it goes wrong.

Is it a good idea to purge old Rails migration files?

I have been running a big Rails application for over 2 years and, day by day, my ActiveRecord migration folder has been growing up to over 150 files.
There are very old models, no longer available in the application, still referenced in the migrations. I was thinking to remove them.
What do you think? Do you usually purge old migrations from your codebase?
The Rails 4 Way page 177:
Sebastian says...
A little-known fact is that you can remove old migration files (while
still keeping newer ones) to keep the db/migrate folder to a
manageable size. You can move the older migrations to a
db/archived_migrations folder or something like that. Once you do trim
the size of your migrations folder, use the rake db:reset task to
(re-)create your database from db/schema.rb and load the seeds into
your current environment.
Once I hit a major site release, I'll roll the migrations into one and start fresh. I feel dirty once the migration version numbers get up around 75.
I occasionally purge all migrations, which have already been applied in production and I see at least 2 reasons for this:
More manageable folder: it is easier to spot a new migration.
Cleaner text search results: global text search within a project does not lead to tons of useless matches because of some 3-year-old migration when someone added or removed some column which anyway does not exist anymore.
They are relatively small, so I would choose to keep them, just for the record.
You should write your migrations without referencing models, or other parts of application, because they'll come back to you haunting ;)
Check out these guidelines:
http://guides.rubyonrails.org/migrations.html#using-models-in-your-migrations
Personally I like to keep things tidy in the migrations files. I think once you have pushed all your changes into prod you should really look at archiving the migrations. The only difficulty I have faced with this is that when Travis runs it runs a db:migrate, so these are the steps I have used:
Move historic migrations from /db/migrate/ to /db/archive/release-x.y/
Create a new migration file manually using the version number from the last run migration in the /db/archive/release-x.y directory and change the description to something like from_previous_version. Using the old version number means that it won't run on your prod machine and mess up.
Copy the schema.rb contents from inside the ActiveRecord::Schema.define(version: 20141010044951) do section and paste into the change method of your from_previous_version changelog
Check all that in and Robert should be your parent's brother.
The only other consideration would be if your migrations create any data (my test scenarios contain all their own data so I don't have this issue)
Why? Unless there is some kind of problem with disk space, I don't see a good reason for deleting them. I guess if you are absolutely certain that you are never going to roll back anything ever again, than you can. However, it seems like saving a few KB of disk space to do this wouldn't be worth it. Also, if you just want to delete the migrations that refer to old models, you have to look through them all by hand to make sure you don't delete anything that is still used in your app. Lots of effort for little gain, to me.
See http://edgeguides.rubyonrails.org/active_record_migrations.html#schema-dumping-and-you
Migrations are not a representation of the database: either structure.sql or schema.rb is. Migrations are also not a good place for setting/initializing data. db/seeds or a rake task are better for that kind of task.
So what are migrations? In my opinion they are instructions for how to change the database schema - either forwards or backwards (via a rollback). Unless there is a problem, they should be run only in the following cases:
On my local development machine as a way to test the migration itself and write the schema/structure file.
On colleague developer machines as a way to change the schema without dropping the database.
On production machines as a way to change the schema without dropping the database.
Once run they should be irrelevant. Of course mistakes happen, so you definitely want to keep migrations around for a few months in case you need to rollback.
CI environments do not ever need to run migrations. It slows down your CI environment and is error prone (just like the Rails guide says). Since your test environments only have ephemeral data, you should instead be using rake db:setup, which will load from the schema.rb/structure.sql and completely ignore your migration files.
If you're using source control, there is no benefit in keeping old migrations around; they are part of the source history. It might make sense to put them in an archive folder if that's your cup of coffee.
With that all being said, I strongly think it makes sense to purge old migrations, for the following reasons:
They could contain code that is so old it will no longer run (like if you removed a model). This creates a trap for other developers who want to run rake db:migrate.
They will slow down grep-like tasks and are irrelevant past a certain age.
Why are they irrelevant? Once more for two reasons: the history is stored in your source control and the actual database structure is stored in structure.sql/schema.rb. My rule of thumb is that migrations older than about 12 months are completely irrelevant. I delete them. If there were some reason why I wanted to rollback a migration older than that I'm confident that the database has changed enough in that time to warrant writing a new migration to perform that task.
So how do you get rid of the migrations? These are the steps I follow:
Delete the migration files
Write a rake task to delete their corresponding rows in the schema_migrations table of your database.
Run rake db:migrate to regenerate structure.sql/schema.rb.
Validate that the only thing changed in structure.sql/schema.rb is removed lines corresponding to each of the migrations you deleted.
Deploy, then run the rake task from step 2 on production.
Make sure other developers run the rake task from step 2 on their machines.
The second item is necessary to keep schema/structure accurate, which, again, is the only thing that actually matters here.
It's fine to remove old migrations once you're comfortable they won't be needed. The purpose of migrations is to have a tool for making and rolling back database changes. Once the changes have been made and in production for a couple of months, odds are you're unlikely to need them again. I find that after a while they're just cruft that clutters up your repo, searches, and file navigation.
Some people will run the migrations from scratch to reload their dev database, but that's not really what they're intended for. You can use rake db:schema:load to load the latest schema, and rake db:seed to populate it with seed data. rake db:reset does both for you. If you've got database extensions that can't be dumped to schema.rb then you can use the sql schema format for ActiveRecord and run rake db:structure:load instead.
Yes. I guess if you have completely removed any model and related table also from database, then it is worth to put it in migration. If model reference in migration does not depend on any other thing, then you can delete it. Although that migration is never going to run again as it has already run and even if you don't delete it from existing migration, then whenever you will migrate database fresh, it cause a problem.
So better it to remove that reference from migration. And refactore/minimize migrations to one or two file before big release to live database.
I agree, no value in 100+ migrations, the history is a mess, there is no easy way of tracking history on a single table and it adds clutter to your file finding. Simply Muda IMO :)
Here's a 3-step guide to squash all migrations into identical schema as production:
Step1: schema from production
# launch rails console in production
stream = StringIO.new
ActiveRecord::SchemaDumper.dump(ActiveRecord::Base.connection, stream); nil
stream.rewind
puts stream.read
This is copy-pasteable to migrations, minus the obvious header
Step 2: making the migrations without it being run in production
This is important. Use the last migration and change it's name and content. ActiveRecord stors the datetime number in it's schema_migrations table so it knows what it has run and not. Reuse the last and it'll think it has already run.
Example: rename 20161202212203_this_is_the_last_migration -> 20161202212203_schema_of_20161203.rb
And put the schema there.
Step 3: verify and troubleshoot
Locally, rake db:drop, rake db:create, rake db:migrate
Verify that schema is identical. One issue we encountered was datetime "now()" in schema, here's the best solution I could find for that: https://stackoverflow.com/a/40840867/252799

Rebase Rails migrations in a long running project

In which I mean "rebasing" in the dictionary, rather than git definition...
I have a large, long running Rails project that has about 250 migrations, it's getting a touch unwieldy to manage all of these.
That said, I do need a base from which to purge and rebuild my database when running tests. So the data contained in these is important.
Does any one have any strategies for say, dumping the schema at a set point - archiving off all the old migrations and starting afresh with new migrations.
Obviously I can use rake schema:dump - but really I need a way that db:migrate will load the schema first and then start running the rest of the migrations.
I would like to keep using migrations as they're very useful in development, however, there's no way I'm going back and editing a migration from 2007 so it seems silly to keep it.
In general, you don't need to clean up old migrations. If you're running db:migrate from scratch (no existing db), Rails uses db/schema.rb to create the tables instead of running every migration. Otherwise, it only runs the migrations required to upgrade from the current schema to the latest.
If you still want to combine migrations up to a given point into a single one, you could try to:
migrate from scratch up to the targeted schema using rake db:migrate VERSION=xxx
dump the schema using rake db:schema:dump
remove the migrations from the beginning up to version xxx and create a single new migration using the contents of db/schema.rb (put create_table and add_index statements into the self.up method of the new migration).
Make sure to choose one of the old migration version numbers for your aggregated new migration; otherwise, Rails would try to apply that migration on your production server (which would wipe your existing data, since the create_table statements use :force⇒true).
Anyway, I wouldn't recommend to do this since Rails usually handles migrations well itself. But if you still want to, make sure to double check everything and try locally first before you risk data loss on your production server.
To automate the merging (or squashing) of migrations, you could use the Squasher gem
Simply install
gem install squasher
And run with a date, and migrations before that date will be merged:
squasher 2016 # => Will merge all migration created before 2016
More details in the README
In addition to the answer provided (which well indicates how to consolidate your volume of migrations), you indicate a concern to purge data (which I assume is manually added after fixtures populate your tables); which infers you're depending on refreshing an initial data state. Some projects indeed require intensive refinement of base data, reconstruction, and re-population of tables. Ours heavily depends on repetitive execution of these operations, and I've found that if you can reduce your schema entirely to SQL execute statements, your tables will rebuild far faster than they will from Ruby syntax.
A trivial further help in rebuilding your tables is to dedicate a separate terminal window to a single combined command statement:
rake db:drop db:create db:schema:load db:fixtures:load
Each time you need to rebuild and re-populate your tables, an up-arrow and return keypress will get that routine job done. If there's no conflict in SQL execute statements, and if you don't have further migrations to run while you're project is in development state, the SQL statements will execute perhaps better than twice as fast as the Ruby syntax. Our tables rebuild and re-populate in 20 seconds this way for example, whereas the Ruby syntax increases the process to well over 50 seconds. If you're waiting on that data to refresh to perform further work (especially many times), this makes a huge difference in workflow.

How to handle one-off deployment tasks with capistrano?

I am currently trying to automate the deployment process of our rails app as much as possible, so that a clean build on the CI server can trigger an automated deployment on a test server.
But I have run into a bit of a snag with the following scenario:
I have added the friendly_id gem to the application. There's a migration that creates all the necessary tables. But to fill these tables, I need to call a rake task.
Now, this rake tasks only has to be called once, so adding it to the deployment script would be overkill.
Ideally, I am looking for something like migrations, but instead of the database, it should keep track of scripts that need to be called during a deployment. Does such a beast already exist?
Looks like after_party gem does exactly what you want.
I can't think of anything that does exactly what you want, but if you just need to be able to run tasks on remote servers in a one off fashion you could always use rake through capistrano.
There's an SO question for that here: How do I run a rake task from Capistrano?, which also links to this article http://ananelson.com/said/on/2007/12/30/remote-rake-tasks-with-capistrano/.
Edit: I wonder if it's possible to create a migration which doesn't do any database changes, but just invokes a rake task? Rake::Task["task:name"].invoke. Worth a try?
I would consider that running that rake task is part of the migration to using friendly_id. Sure, you've created the tables, but you're not done yet! You still have to do some data updates before you've truly migrated.
Call the rake task from your migration. It'll update the existing data and new records will be handled by your app logic in the future.

Resources