How should I handle database schema changes when switching branches in Rails? - ruby-on-rails

Currently I'm working on a Rails project, where I keep on constantly switching between the deployable master branch, and then many other branches, where I implement new features.
The problem is, that usually these features add some tables to the database, which means every time I switch a branch, I have to drop the database, migrate and then populate it with some dummy data.
I can do this in about two to three steps, since I have a rake task that creates all the dummy data again, but it's not very fast (couple of minutes). It's not the worst wait time ever, but I'd like to know if there are any alternative solutions, where I don't have to recreate the database every time I checkout a branch.
I'm currently using MySQL on my development machine.

Why don't keep the databases for each branch and just switch the connection strings.

Related

Postgresql 10 logical replication - what is the best way to sync replica`s db tables

I was set up two VMs where are the first VM is master PostgreSQL and the second is the slave.
I use PostgreSQL 10 with logical replicating, so I created publisher and subscription.
Initially, I created necessary tables on Master, then take backup and apply it to the slave, so all tables are synced and all working good.
I am using Rails app with migrations, so, now I want to apply the migration to master DB which will create a lot of new tables.
What is the best way to create the same tables with indexes to replication?
A simple solution for me - create a master DB dump again and apply it to slave.
But, maybe there are exists other solutions to keep database structure synced?
You can use Continuous Archiving to push any changes that happen to the master to the slave .
https://www.postgresql.org/docs/12/continuous-archiving.html

How to send SQL queries to two databases simultaneously in Rails?

I have a very high-traffic Rails app. We use an older version of PostgreSQL as the backend database which we need to upgrade. We cannot use either the data-directory copy method because the formats of data files have changed too much between our existing releases and the current PostgreSQL release (10.x at the time of writing). We also cannot use the dump-restore processes for migration because we would either incur downtime of several hours or lose important customer data. Replication would not be possible as the two DB versions are incompatible for that.
The strategy so far is to have two databases and copy all the data (and functions) from existing to a new installation. However, while the copy is happening, we need data arriving at the backend to reach both servers so that once the data migration is complete, the switch becomes a matter of redeploying the code.
I have figured out the other parts of the puzzle but am unable to determine how to send all writes happening on the Rails app to both DB servers.
I am not bothered if both installations get queried for displaying data to the user (I can discard the data coming out of the new installation); so, if it is possible on driver level, or adding a line somewhere in the ActiveRecord, I am fine with it.
PS: Rails version is 4.1 and the company is not planning to upgrade that.
you can have multiple database by adding an env for the database.yml file. After that you can have a seperate class Like ActiveRecordBase and connect that to the new env.
have a look at this post
However, as I can see, that will not solve your problem. Redirecting new data to the new DB while copying from the old one can lead to data inconsistencies.
For and example, ID of a record can be changed due to two data source feeds.
If you are upgrading the DB, I would recommend define a schedule downtime and let your users know in advance. I would say, having a small downtime is far better than fixing inconstant data down the line.
When you have a downtime,
Let the customers know well in advance
Keep the downtime minimal
Have a backup procedure, in an even the new site takes longer than you think, rollback to the old site.

Making Changes Between Two Databases ASP.NET MVC

I'm working on a project in which we have two versions of an MVC App, the live, and the dev versions, I've made changes to the dev version and added tables and data, etc.
Is there any way to migrate these changes onto the live version without losing all data(i.e. just regenerating the database).
I've already tried just rebuilding the database but we lose all data that was previously stored( as obviously we are essentially deleting the old database and rebuilding it).
tl;dr
How do I migrate my dev version of an mvc app along with any new tables to the live version of an mvc app with missing models and tables.
Yes, it is possible to migrate your changes from your dev instance to your production instance; to do so you must create SQL scripts that update your production database with the changes. This may be accomplished by manually writing the scripts or by using tools to generate the scripts for you. However you go about it, you will need scripts to update your database (well, you could perform manual updates via the tooling of your database, but this is not desirable, as you want the updates to occur in a short time window, and you want this to happen reliably and repeatably).
The easiest way that I know of to do this is to use tools like SQL Compare (for schema updates) or SQL Data Compare (for data updates). These are from Redgate, but they cost a fair bit of money. They are well worth they price, and most companies I've worked with are happy to pay for licenses. However, you may not want to shell out for them personally. To use these tools, you connect them to source and destination databases, and they analyze the differences between the databases (schematically or data) and produce SQL scripts. These scripts may then be run manually or by the tools themselves.
Ideally, when you work on your application, you should be producing these scripts as you go along. That way when it comes time to update your environments, you may simply run the scripts you have. It is worth taking the time to include this in your build process, so database scripts get included in your builds. You should write your scripts so they are idempotent, meaning that you can run them multiple times and the end result will be the same (the database updated to the desired schema and data).
One may of managing this is creating a DBVersions table in your database. This table includes all your script updates. For example you could have a table like the following (this is SQL Server 2008 dialect):
CREATE TABLE [dbo].[DBVersions] (
[CaseID] [int] NOT NULL,
[DateExecutedOn] [datetime] NOT NULL,
CONSTRAINT [PK_DBVersions] PRIMARY KEY CLUSTERED (
[CaseID] ASC
)
) ON [PRIMARY]
CaseID refers to the case (or issue) number of the feature or bug that requires the SQL update. Your build process can check this table to see if the script has already been run. If not, it runs it. This is useful if you cannot write your scripts in a way that allows them to be run more than once. If all your scripts can be run an unbounded number of times, then this table is not strictly necessary, though it could still be useful to reduce the need to run a large number of script every time a deployment is done.
Here are links to the Redgate tools. There may be many other tools out there, but I've had a very good experience with these.
http://www.red-gate.com/products/sql-development/sql-compare/
http://www.red-gate.com/products/sql-development/sql-data-compare/
It depends on your deployment strategy and this is more of a workflow that your team need to embrace. If regenerating the live database from scratch it can take awhile depending how big the database size is. I don't see a need to do this in most scenarios.
You would only need to separate out database schema object and data row scripts. The live database version should have its database schema objects scripted out and stored in a repository. When a developer is working on a new functionality, he/she will need to make those changes against the database scripts in the repository. If there is a need to make changes to the database rows then the developer will also need to check in the data row scripts in the repository. On a daily deployment the live database version can be compared against what is checked in the repository and pushed to make it in sync.
On our side we use tools such as RedGate Schema Compare and Data Compare to do the database migration from the dev version to our intended target version.

Is it safe to run migrations on a live database?

I have a simple rails-backed app running 2-3 million pageviews a day off a Heroku Ronin database. The load on the database is pretty light, though, and it could handle a lot more than we're throwing at it.
Is it safe for me to run a migration to add tables to this database without going into maintenance mode? Also, would it be safe to run a migration to add a few columns to the core table responsible for almost all of the reads and writes?
Downtime is not acceptable, even for a few minutes.
If running migrations live isn't advisable, what I'll probably do is set up a new database, run the migrations on that, write a script to sync the two databases, and then point the app at the new one.
But I'd rather avoid that if possible. :)
Sounds like your migration includes:
adding new tables (perhaps indexes? If so, that could take a bit longer than you might expect)
adding new columns (default values and/or nullable?)
wrapping your changes in a transaction (?)
Suggest you gauge the impact that your changes will have on your Prod environment by:
taking a backup of Prod (with all the Prod data within)
running your change scripts against that. Time each operation
Balance the 2 points above against the typical read & write load at the time you're expecting to run this (02:00, right?).
Consider a 'soft' downtime by disabling (somehow) write operations to the tables being effected.
Overall (or in general), adding n tables and new nullable columns to an existing table would/could likely be done without any downtime or performance impact.
Always measure the impact your changes will have on a copy of Prod. Measure 'responsiveness' at the time you apply your changes to this copy. Of course this means deploying another copy of your Prod app as well, but the effort would be worthwhile.
Assuming it's a pg database (which it should be for Heroku).
http://www.postgresql.org/docs/current/static/explicit-locking.html
alter table will acquire an access exclusive lock. So, the table will be locked.
On top of this, you will be required to restart the Rails application in order for it to be aware of any new models. If you are going to be adding tables to the application or modifying model code in any way.
As for pointing to a new app with a freshly modified database, how are you going to do the sync of the data and also sync the changes in data between the two databases in the time that the sync takes?
Adding tables shouldn't be a concern, as your application won't be aware of them until proper upgrades are done. As for adding columns to a core table, I'm not so sure. If you really need to prevent downtime, perhaps it's better to add a secondary table that (linked by an ID with the core table) adds your extra columns.
Just my two cents.

How to prepare for data loss in a production website?

I am building an app that is fast moving into production and I am concerned about the possibility that due to hacking, some silly personal error (like running rake db:schema:load or rake db:rollback) or other circumstance we may suffer data loss in one database table or even across the system.
While I don't find it likely that the above will happen, I would be remiss in not being prepared in case it ever does.
I am using Heroku's PG Backups (which is to be replaced with something else this month), and I also run automated daily backups to S3: http://trevorturk.com/2010/04/14/automated-heroku-backups/, successfully generating .dump files.
What is the correct way to deal with data loss on a production app?
How would I restore the .dump file in case I need to? Can I do a selective restore if a small part of the system is hit?
In case a selective restore is not possible: assume one table loses data 4 hours after the last backup. Result => would fixing the lost table require rolling back 4 hours of users' activity? Any good solution to this?
What is the best way to support users through the inconvenience if something like this happens?
A full DR (disaster recovery) solution requires the following:
Multisite. If a fire, flood, Osama Bin Laden or whathaveyou strikes the Amazon (or is it Salesforce?) data center that Heroku uses, you want to be sure that your data is safe elsewhere.
On-going replication of the data to a separate site (or sites). That means that every transaction that's written to your database on one site, is replicated within seconds to the mirror database on the other site. Most RDBMS's have mechanisms to let you do a master-slave replication like that.
The same goes for anything you put on a filesystem outside of the database, such as images, XML configuration files etc. S3 is a good solution here - they replicate everything to multiple data centers for you.
I won't hurt to create periodic (daily or so) dumps of the database and store them separately (e.g. on S3). This helps you recover from data corruption that propagates to the slave DBs.
Automate the process of data recovery. You want this to just work when you need it.
Test everything. Ideally, you want to automate the test process and run it periodically to ensure that your backups can restore. Netflix Chaos Monkey is an extreme example of this.
I'm not sure how you'd implement all this on Heroku. A complete solution is still priced out of reach for most companies - we're running this across our own data centers (one in the US, one in EU) and it costs many millions. Work according to the 80-20 rule - on-going backup to a separate site, plus a well tested recovery plan (continuously test your ability to recover from backups) covers 80% of what you need.
As for supporting users, the best solution is simply to communicate timely and truthfully when trouble happens and make sure you don't lose any data. If your users are paying for your service (i.e. you're not ad-supported), then you should probably have an SLA in place.
About backups, you cannot be sure at 100 percent every time that no data will be lost. The best is to test it on another server. You must have at leat two types of backup :
A database backup, like pg-dump. A dump is uniquely SQL commands so you can use it to recreate the whole database, just a table, or just a few rows. You loose the data added in the meantime.
A code backup, for example a git repository.
in addition to Hartator's answer:
use replication if your DB offers it, e.g. at least master/slave replication with one slave
do database backups on a slave DB server and store them externally (e.g. scp or rsync them out of your server)
use a good version control system for your source code, e.g. Git
use a solid deploy mechanism, such as Capistrano and write your custom tasks, so nobody needs to do DB migrations by hand
have somebody you trust check your firewall setup and the security of your system in general
The DB-Dumps contain SQL-commands to recreate all tables and all data... if you were to restore only one table, you could extract that portion from a copy of the dump file and (very carefully) edit it and then restore with the modified dump file (for one table).
Always restore first to an independent machine and check if the data looks right. e.g. you could use one Slave server, take if offline, then restore there locally and check the data. Good if you have two slaves in your system, then the remaining system has still one master and one slave while you restore to the second slave.
To simulate a fairly simple "total disaster recovery" on Heroku, create another Heroku project and replicate your production application completely (except use a different custom domain name).
You can add multiple remote git targets to a single git repository so you can use your current production code base. You can push your database backups to the replicated project, and then you should be good to go.
The only step missing from this exercise verses a real disaster recovery is assigning your production domain to the replicated Heroku project.
If you can afford to run two copies of your application in parallel, you could automate this exercise and have it replicate itself on a regular basis (e.g. hourly, daily) based on your data loss tolerance.

Resources