My basic question is: if I repeatedly copy a Rails app, so there are many generations of the same repo (i.e., various iterations of a Rails app's directory and files), what do I need to do to ensure that the server runs normally and avoid major issues?
I'm writing a learning app that drills the user on programming tasks. Right now it supports only single-file tasks. I want to add support for multiple-file tasks next, involving HTML/CSS/JS and Rails tasks (e.g., "add a model that does such-and-such" or "add a Minitest test for such-and-such feature"). The user will be required to edit the Rails code directly, and my app will then automatically run the server and show the results. After each question is answered (i.e., each task is performed), my app will migrate down the database automatically as necessary and copy the repo anew from a tarball--basically, preparing the stage for the next time the user tackles the task. (Well, I hope it's a good idea.)
Since Rails apps are so big and complex, of course it's not feasible to build and add a separate Rails app for every question. Instead, I will have many questions/tasks that are based on the same repo (installation). After each question is answered (i.e., each task is performed), the database will be migrated down as necessary and the repo copied anew from a tarball. So far, so good? (I anticipate problems using Git to do this...so I would just use Minitar for this.)
But of course I will have to make other versions of the same repo (using the same database, or maybe a copy) when I make other clusters of questions. For example, I might want a bunch of questions/tasks related to using AJAX in Rails, and for that I need to prep an installation in various ways. But if I'm just building on a copy of a previous repo that has its own tasks, will the copying process cause issues for the later repo and its tasks?
I have done some testing. I have already confirmed that if I simply execute cp -r repo1/ repo2/ and then run rails s in repo2, the server for the latter starts normally. While data written in repo2 does not appear in repo1, I can't create an identically-named model (which is a little puzzling). I imagine this might be a problem for some questions--i.e., I don't really want them running from one and the same database for all repos, even if later database versions are based on earlier versions. So whenever I copy a repo, I guess I'll want to make a copy of the database as explained here. Sound right?
Is there anything else I'd need to do in building this feature that would prevent issues related to repeatedly copying different iterations of the same repo (and database)?
I think you're making it more complicated than it needs to be. This can all be done in git by leveraging git feature branches, e.g. question-1, question-2, for each derivation and combining that with the rails rake database tasks, e.g. rake db:drop, rake db:create, rake
db:migrate, rake db:seed, to ensure your database is bootstrapped properly for each branch.
An alternative approach could be to add SQL dumps of your final database state to each feature branches and load them via a rake task to bootstrap your database to your desired state.
Related
We have a rails v6 app running in Heroku. We have a Postgres DB and we use elastic search. When we push new releases, there are usually two things that need to happen in each release phase:
Run database migrations
Update the elastic search index
Currently, both of these operations happen in a bash script invoked by the Procfile (details below).
The problem is that even though the new code isn't "released" until after the release tasks finish, the database migrations take effect immediately. The DB migrations often introduce breaking changes that cause errors until the corresponding code is released. Normally this wouldn't be a major issue, but Elasticsearch reindexing takes almost two hours to complete (and it needs to happen after migrations). So during that reindexing time, the database has been updated but the code hasn't been released yet. And that's too long for the site to be broken or in maintenance mode.
Database migrations and reindexing are pretty common operations and Heroku is very popular. So my question is what is the correct way to orchestrate this kind of release without multiple hours of downtime?
Uncessful ideas
My first idea was to perform the migrations after the reindexing. But often the migrations modify DB fields that get used during re-indexing. So that won't work.
I also thought about trying to perform the re-indexing on a new/different Elasticsearch index, and then point the app at the newer one when the process completes. I'm not sure how to do this, but it is also problematic because the newly released code often needs the updated index to work properly. So we would still potentially break the site while the reindexing is happening.
Details
Procfile
release: bash ./release-tasks.sh
rails: bundle exec bin/rails s -p 3000
Here's a simplified version of our release-tasks.sh script:
echo "Running migrations on an existing DB...";
rake db:migrate;
# This effects the production DB immediately
echo "Reindexing..."
rake searchkick:reindex:all
# This takes two hours to finish before the code goes live
DB migration should not introduce breaking changes. Your migrations should be "safe", i.e. your pre-deployment code and post-deployment code should both work if the migration ran.
For instance, you should not remove a column from your database unless the pre-deployment code has the column in question in the self.ignored_columns.
Check strong_migrations for more info. The gem page lists the Potentially dangerous operations and provides the safe alternative to run them.
Context
I'm using Heroku to serve my rails API (v5.2) with a PostgreSQL database,
Frequently, after some migrations, I have to manually run some specific rake tasks.
Those rake tasks typically delete all the rows of a table before recreating them with different data.
This is problematic for me because it create a downtime for approx. 20 minutes, twice a week (by turning on and off Maintenance mode).
Problem
I would like to avoid downtime between my migrations.
Intended solution
for this, I planned on using Heroku preboot alongside release phase tasks.
After activated preboot for my app, I will put a script in my Procfile
release: ./release-tasks.sh
And in the release-tasks.sh file something like:
heroku run rake my_rake_task --app myApp
Questions
Is it a good/ok solution?
Is it sure that during the migration phase, users will be able to
query the "old" database before the new one is live?
Is there a way to activated release scripts on demand? (e.g using an
env var in Heroku? -- I won't need it for every migrations).
This is a good solution, yes. Release Phase is meant exactly to help running migrations whenever the app is deployed.
This won't prevent downtime in your specific case though. Release phase doesn't start a new database with every release. It just runs a one-off dyno with your command.
Your only solution here is to change your migration strategy to avoid deleting and recreating everything. Depending on what you're doing, you may be able to just update/add/remove the data you need.
Or you could create a new temporary table with the new data, and then delete the old table and rename the new one to its permanent name.
Both those solutions are something you need to write your own code for though.
I have to do a lot of conditioning of the data that will seed my rails database. This is a one-shot activity before deployment, never used after deployment, but I want to keep the programs I use for it within the projects configuration management (mainly for the sake of an audit trail for where the seed data came from).
Where is the canonical place in a Rails app for such support files that don't form part of the application?
Seed data should go in db/seed.rb. You can learn more about seed data in the docs.
The problem with adding all these items to your repository is that not only will it make the checked in code large, also you will have to clean the code each time after deploy.
I do not think such items should be checked in. Personally, I place all such items in public data, upload it for first deploy and then next deploy will no longer have this folder as the deployment using capistrano will not link to the data folder anymore.
This way the data can stay in the shared folder on the server should you need it again but not in your repository.
I'm considering using Entity Framework 4.3 migrations in a production site. The following is my concern:
If the migration fails for whatever reason, I want all statements
rolled back and the site placed in a down state so no users can use
the site while I try to fix the problem. The only thing is I can't
fall back to executing the scripts by hand against the database since
migration files are compiled in the assembly. I could keep track of
both migration files and sql script files separately but at that point why use
migrations at all.
At work, script files are stored in a SQL folder (that no one can browse to) on the site. Previously run script files are registered in the database. When new script files appear in the folder (and aren't in the database), if a user is an admin, they'll be redirected to a DB portal, else get a site-down-for-maintenance. If we try and fail to execute any scripts from the portal, we grab the new scripts and try to run them manual inside of express studio. This has worked for over a decade. I'm only exploring migrations to see if a better way has arrived. It doesn't feel like it. Please let me know if there's a better way and if it's not Migrations, what is it.
I would definitely not use automatic migrations in a production environment. In fact, I'm too much of a control freak to use automatic migrations at all. I prefer code based migrations to ensure that all databases (developers' own personal, test environment, production) have gone through exactly the same update sequence.
Using code-based migrations where each migration step gets a name, a separate migration script can be generated for use when lifting test and prod environments. Preferably it should be run on a copy of the database to validate before running on the real prod environment.
Added Later
To prevent EF Migrations from doing anything automatic at all, a custom initializer strategy can be used. See my blog or the Deploying EF Code First Apps to Production Database Stack Overflow question.
I am currently trying to automate the deployment process of our rails app as much as possible, so that a clean build on the CI server can trigger an automated deployment on a test server.
But I have run into a bit of a snag with the following scenario:
I have added the friendly_id gem to the application. There's a migration that creates all the necessary tables. But to fill these tables, I need to call a rake task.
Now, this rake tasks only has to be called once, so adding it to the deployment script would be overkill.
Ideally, I am looking for something like migrations, but instead of the database, it should keep track of scripts that need to be called during a deployment. Does such a beast already exist?
Looks like after_party gem does exactly what you want.
I can't think of anything that does exactly what you want, but if you just need to be able to run tasks on remote servers in a one off fashion you could always use rake through capistrano.
There's an SO question for that here: How do I run a rake task from Capistrano?, which also links to this article http://ananelson.com/said/on/2007/12/30/remote-rake-tasks-with-capistrano/.
Edit: I wonder if it's possible to create a migration which doesn't do any database changes, but just invokes a rake task? Rake::Task["task:name"].invoke. Worth a try?
I would consider that running that rake task is part of the migration to using friendly_id. Sure, you've created the tables, but you're not done yet! You still have to do some data updates before you've truly migrated.
Call the rake task from your migration. It'll update the existing data and new records will be handled by your app logic in the future.