How to send SQL queries to two databases simultaneously in Rails? - ruby-on-rails

I have a very high-traffic Rails app. We use an older version of PostgreSQL as the backend database which we need to upgrade. We cannot use either the data-directory copy method because the formats of data files have changed too much between our existing releases and the current PostgreSQL release (10.x at the time of writing). We also cannot use the dump-restore processes for migration because we would either incur downtime of several hours or lose important customer data. Replication would not be possible as the two DB versions are incompatible for that.
The strategy so far is to have two databases and copy all the data (and functions) from existing to a new installation. However, while the copy is happening, we need data arriving at the backend to reach both servers so that once the data migration is complete, the switch becomes a matter of redeploying the code.
I have figured out the other parts of the puzzle but am unable to determine how to send all writes happening on the Rails app to both DB servers.
I am not bothered if both installations get queried for displaying data to the user (I can discard the data coming out of the new installation); so, if it is possible on driver level, or adding a line somewhere in the ActiveRecord, I am fine with it.
PS: Rails version is 4.1 and the company is not planning to upgrade that.

you can have multiple database by adding an env for the database.yml file. After that you can have a seperate class Like ActiveRecordBase and connect that to the new env.
have a look at this post
However, as I can see, that will not solve your problem. Redirecting new data to the new DB while copying from the old one can lead to data inconsistencies.
For and example, ID of a record can be changed due to two data source feeds.
If you are upgrading the DB, I would recommend define a schedule downtime and let your users know in advance. I would say, having a small downtime is far better than fixing inconstant data down the line.
When you have a downtime,
Let the customers know well in advance
Keep the downtime minimal
Have a backup procedure, in an even the new site takes longer than you think, rollback to the old site.

Related

ddev and TYPO3: how to handle the DB and fileadmin for multiple developers

how do you handle user upload folders like fileadmin and the DB with ddev and TYPO3.
I would like to have the DB and media files outside of my ddev container as both can get really large over time and I don't want to sync them every time. Or do I have to?
It would be awesome to just have them on a central server where every developer has access to.
For the DB it is not the problem.
But as far as I know to mount the fileadmin outside of the ddev container is not possible.
How do you handle the DB and media files?
For the companies I've worked for data for a development environment is either (1) rsynced from a central server or (2) have a minimal data set which is added to the git repository.
In case of option 1 there's usually an automated process which pulls data from production servers and cleans it up (removing cache/logs, anonymize any sensitive data, etc). The advantages of this option are you have (mostly) real data for your development environment and there's no need to manually manage a separate data set. The disadvantages are you might not have data to test all situations, data can get large and there's a chance you might miss sensitive data which could lead to data leaks.
In case of option 2 there's usually a way to generate random data to get a more filled development environment. The advantages to this option are you have a clean development environment, the data set is as small as it can be and there's no chance of leaking sensitive data. The disadvantages are you need to maintain a separate minimal data set, problems related to specific data might be harder to debug.
Personally I think 2 is the better option. You should not need production data for development as long as you have a good way to create realistic random data. Production data might actually miss a lot of situations you do need for development. Some content elements might not always be used, things like empty news lists might not happen (often) in production, etc. I also don't want to have to download several Gb of data if I have to change a small thing in a project I don't have locally yet.

Database Changes Outside Ruby/Rails Migration

we have several technologies accessing the same database. At the moment, Ruby/Rails is used to create migrations when making changes to the database. The question is a simple one:
Is it possible for our DBAs to make changes to the database (not using Ruby migrations) without stepping on the Ruby devs toes and breaking the Ruby web application?
If so, some generic details about how to get started or pointed in the right direction would be great! Thanks.
I can tell you from experience that this is not the best idea, one that you will eventually regret and later, inevitably, reverse. But I know that it does come up. I've had to do them (against my will or in case of extreme emergencies).
Given the option, I'd push back on it if you can in favor of any solution that bring the SQL closer to the repository and further away from a "quick fix" to the database directly. Why?
1) Your local/testing/staging/production databases will diverge, eventually rendering your code untestable in a reliable way
2) You won't be able to regenerate your database from "scratch" to match production
3) If the database is ever clobbered, you won't be able to re-create it in any sensible way.
DBA's generally don't care about these things until something in the code breaks, and they want you to figure it out. But, for obvious reasons, that now becomes quite difficult.
One approach I have taken that seems to make everyone happy is to do the following:
1) Commit to having ALL database changes, big or small, put into a repository with the code. This means that everything that has happened to the database is all together in one place.
2) Each change, or set of changes, should be a migration. A migration can be simply running an SQL file. But, it should be run from within a migration for all the testability benefits.
So, for example, let's say you have a folder structure like:
- database_updates
-- v1
--- change_1.sql
--- change_2.sql
-- v2
--- change_3.sql
--- change_2_fix.sql
Now, let say you want to make a change or set of change via SQL. First, create a new version folder, let's call it "v1". Next, put your SQL scripts in this folder. Finally, create a migration:
def change
# Read all files in v1 folder, and run the SQL
end
(I have code that does this, happy to share the gist if you find yourself using this approach)
Since each migration is transactional, any of the scripts that fail will cause all of them to fail.
Now, let's say you have the next set, v2. Same exact thing. And, we have a history of these "versioned" changes, so we can look at the migration history and see what's been run, etc.
As a power user note, this set up also allows for recourse if things fail; in these cases, we can opt to go back to v1:
def up
# run v2 scripts
end
def down
# run v1 scripts
end
For this to work, v1 and v2 would need to be autonomous -- that is, they can destroy and rebuild entities without any dependencies. If that's not what you want, just stick with the change method.
This would also allow you to test for breaking changes. Let's say it is reported that something doesn't work anymore with v6. You can rollback your database migrations to v5, v4, etc (because you are doing a migration per folder) and test to see when the test broke, and correct it with v7.
Anyway, the end game of it all is that you can safely check out this project from a repository, create your database, run rake db:migrate and know that your database structure resembles exactly what is deployed elsewhere. And, worst case, if your database gets clobbered, you can just run all your scripts from v1 - vN and end up with your database back again.
For DBA's everything remains SQL for them, they can just send you a file or set of files for you to run.
If you want to get fancy, you could even write a migration generator that knows how to handle a line like rails g migration UpdateDBVersion version:v7 to take care of the repetitive boilerplate.
As long as everyone relies on the same updated schema.rb or structure.sql, everyone will share the same database 'version'.
See this SO answer for more insight.
Changes to the database, tables, or indexes should be made using ActiveRecord migrations whenever possible. This specifically ensures that development and test environments remain logically in sync. Remember that developers must be capable of accurate development and testing against the same database structure as occurs in the production environment, and QA teams must be able to adequately test such changes.
However, some database features are not actually supported by ActiveRecord migrations, and may only be applied directly to the database. These features are often database-specific, such as any of the following:
Views
Triggers
Stored procedures
Indexes with function-based columns
Virtual columns
Essentially any database-specific features that don't have an ActiveRecord abstraction will be made directly to the database.
Sometimes, however, other applications require the addition of tables, columns, or indexes in order to operate properly or efficiently. These other applications may simply be used to view/report against the database, or they may be substantial business applications that have their own independent database requirements and separate development teams. Occasionally, a DBA may have to step in and create an index or provide some optimization needed to solve a real-world production performance issue.
There are simply far too many situations for shared database management to give a definitive answer. Depending on the size of the organization and the complexity of the needs for the shared management, there may be many ways to solve the problem of a shared database schema that are specific to the application or organization.
For instance, I have worked on applications that shared a database with as many as 10 other applications, each of which "owned" portions of the schema and shared other portions with the other teams, all mediated through the DBA group. In situations such as this, the organizational structure and change control process may be the only means of solving this problem.
Whichever the situation, some real-world suggestions may help avoid problems and mitigate maintenance woes:
Offer to translate SQL DDL commands into ActiveRecord migrations, where possible, so that DBAs can accomplish their needs, and the application team can still appropriately maintain the schema
Any changes made outside the ActiveRecord migration should be thoroughly tested for impact to the project in a non-production environment by the same QA resources that test the actual Rails application
Encapsulate any external changes in a .sql file and include the file as part of the project in version control
If the development team is using the same database product in development (some cannot, due to licensing or complexity), those changes should be applied to the developer database instances, as well
It's best if you can apply the changes during a migration, even just by calling the relevant CLI tools as a migration step - the exact mechanism will be database-dependent, as well
Try to avoid doing this more than is absolutely necessary, as this can significantly reduce the database independence of the application, even between versions of the same database product (limiting upgrade opportunities)

Is it safe to run migrations on a live database?

I have a simple rails-backed app running 2-3 million pageviews a day off a Heroku Ronin database. The load on the database is pretty light, though, and it could handle a lot more than we're throwing at it.
Is it safe for me to run a migration to add tables to this database without going into maintenance mode? Also, would it be safe to run a migration to add a few columns to the core table responsible for almost all of the reads and writes?
Downtime is not acceptable, even for a few minutes.
If running migrations live isn't advisable, what I'll probably do is set up a new database, run the migrations on that, write a script to sync the two databases, and then point the app at the new one.
But I'd rather avoid that if possible. :)
Sounds like your migration includes:
adding new tables (perhaps indexes? If so, that could take a bit longer than you might expect)
adding new columns (default values and/or nullable?)
wrapping your changes in a transaction (?)
Suggest you gauge the impact that your changes will have on your Prod environment by:
taking a backup of Prod (with all the Prod data within)
running your change scripts against that. Time each operation
Balance the 2 points above against the typical read & write load at the time you're expecting to run this (02:00, right?).
Consider a 'soft' downtime by disabling (somehow) write operations to the tables being effected.
Overall (or in general), adding n tables and new nullable columns to an existing table would/could likely be done without any downtime or performance impact.
Always measure the impact your changes will have on a copy of Prod. Measure 'responsiveness' at the time you apply your changes to this copy. Of course this means deploying another copy of your Prod app as well, but the effort would be worthwhile.
Assuming it's a pg database (which it should be for Heroku).
http://www.postgresql.org/docs/current/static/explicit-locking.html
alter table will acquire an access exclusive lock. So, the table will be locked.
On top of this, you will be required to restart the Rails application in order for it to be aware of any new models. If you are going to be adding tables to the application or modifying model code in any way.
As for pointing to a new app with a freshly modified database, how are you going to do the sync of the data and also sync the changes in data between the two databases in the time that the sync takes?
Adding tables shouldn't be a concern, as your application won't be aware of them until proper upgrades are done. As for adding columns to a core table, I'm not so sure. If you really need to prevent downtime, perhaps it's better to add a secondary table that (linked by an ID with the core table) adds your extra columns.
Just my two cents.

How to prepare for data loss in a production website?

I am building an app that is fast moving into production and I am concerned about the possibility that due to hacking, some silly personal error (like running rake db:schema:load or rake db:rollback) or other circumstance we may suffer data loss in one database table or even across the system.
While I don't find it likely that the above will happen, I would be remiss in not being prepared in case it ever does.
I am using Heroku's PG Backups (which is to be replaced with something else this month), and I also run automated daily backups to S3: http://trevorturk.com/2010/04/14/automated-heroku-backups/, successfully generating .dump files.
What is the correct way to deal with data loss on a production app?
How would I restore the .dump file in case I need to? Can I do a selective restore if a small part of the system is hit?
In case a selective restore is not possible: assume one table loses data 4 hours after the last backup. Result => would fixing the lost table require rolling back 4 hours of users' activity? Any good solution to this?
What is the best way to support users through the inconvenience if something like this happens?
A full DR (disaster recovery) solution requires the following:
Multisite. If a fire, flood, Osama Bin Laden or whathaveyou strikes the Amazon (or is it Salesforce?) data center that Heroku uses, you want to be sure that your data is safe elsewhere.
On-going replication of the data to a separate site (or sites). That means that every transaction that's written to your database on one site, is replicated within seconds to the mirror database on the other site. Most RDBMS's have mechanisms to let you do a master-slave replication like that.
The same goes for anything you put on a filesystem outside of the database, such as images, XML configuration files etc. S3 is a good solution here - they replicate everything to multiple data centers for you.
I won't hurt to create periodic (daily or so) dumps of the database and store them separately (e.g. on S3). This helps you recover from data corruption that propagates to the slave DBs.
Automate the process of data recovery. You want this to just work when you need it.
Test everything. Ideally, you want to automate the test process and run it periodically to ensure that your backups can restore. Netflix Chaos Monkey is an extreme example of this.
I'm not sure how you'd implement all this on Heroku. A complete solution is still priced out of reach for most companies - we're running this across our own data centers (one in the US, one in EU) and it costs many millions. Work according to the 80-20 rule - on-going backup to a separate site, plus a well tested recovery plan (continuously test your ability to recover from backups) covers 80% of what you need.
As for supporting users, the best solution is simply to communicate timely and truthfully when trouble happens and make sure you don't lose any data. If your users are paying for your service (i.e. you're not ad-supported), then you should probably have an SLA in place.
About backups, you cannot be sure at 100 percent every time that no data will be lost. The best is to test it on another server. You must have at leat two types of backup :
A database backup, like pg-dump. A dump is uniquely SQL commands so you can use it to recreate the whole database, just a table, or just a few rows. You loose the data added in the meantime.
A code backup, for example a git repository.
in addition to Hartator's answer:
use replication if your DB offers it, e.g. at least master/slave replication with one slave
do database backups on a slave DB server and store them externally (e.g. scp or rsync them out of your server)
use a good version control system for your source code, e.g. Git
use a solid deploy mechanism, such as Capistrano and write your custom tasks, so nobody needs to do DB migrations by hand
have somebody you trust check your firewall setup and the security of your system in general
The DB-Dumps contain SQL-commands to recreate all tables and all data... if you were to restore only one table, you could extract that portion from a copy of the dump file and (very carefully) edit it and then restore with the modified dump file (for one table).
Always restore first to an independent machine and check if the data looks right. e.g. you could use one Slave server, take if offline, then restore there locally and check the data. Good if you have two slaves in your system, then the remaining system has still one master and one slave while you restore to the second slave.
To simulate a fairly simple "total disaster recovery" on Heroku, create another Heroku project and replicate your production application completely (except use a different custom domain name).
You can add multiple remote git targets to a single git repository so you can use your current production code base. You can push your database backups to the replicated project, and then you should be good to go.
The only step missing from this exercise verses a real disaster recovery is assigning your production domain to the replicated Heroku project.
If you can afford to run two copies of your application in parallel, you could automate this exercise and have it replicate itself on a regular basis (e.g. hourly, daily) based on your data loss tolerance.

DB management for Heroku apps

I'm fairly new to both Rails and Heroku but I'm seriously thinking of using it as a platform to deploy my Ruby/Rails applications.
I want to use all the power of Heroku, so I prefer the "embedded" PostgreSQL managed by Heroku instead of the addon for Amazon RDS for MySQL, but I'm not so confident without the possibility to access my data in a SQL client...
I know that in a well made app you have no need to access DB, but there are some situations (add rows to a config table, see data not mapped in a view, update some columns for debugging issues, performance monitoring, running queries for reporting, etc.) when this can be good...
How do you solve this problem? What's you experience in a real life app powered by Heroku?
Thanks!
I have been using it for a about a year. I love the workflow that it provides but I find not having access to the data is a real bother. Your options for working with database are:
Taps: In theory you create your database however you want locally and use taps to copy both schema and data to Heroku. In practice, most of the time its amazingly great. However I am currently dealing with the cleanup after taps translated some of my columns poorly and corrupted my data.
Heroku console: Totally fine for all the usual ActiveRecord stuff, but closest you can get to the database is ActiveRecord::Base.connection.execute "some sql". When you find yourself wondering about doing alter table commands like that you will know you're in trouble.
They also provide a "bundle" as a method for backing up your app. This lets you download all your code plus a sql dump of the database. The difficulty is that since there is no direct database access there is no way of loading that same sql dump back into the database so you can recover from dataloss, which, to me, is the point of having those dump files to begin with. All you can use the bundle for is to create a new application (heroku bundles:animate), not restore a current app.
I would love to be wrong about any/all of these. This seems like a curious rough spot in the best thought out service that I know of. Database access normally doesn't feel like much to give up when most of what you do is made so easy.
To me database access is like a fire extinguisher. Usually not a big deal, but when it matters, it matters a lot.
I use admin_data to give me some insight as to what is going on. I've successfully used it on Heroku, as well as other hosting providers.
Firstly let me start off by saying that heroku is awesome. I've had a great experience deploying my application and integrating with their other services such as websolr.
With that said, your questions:
Getting at your data
If you want to be able to get to your data you can use taps to pull your remote database down locally. This can be useful for debugging.
Performance monitoring
Use new relic RPM. This comes as part of heroku, you can enable it from the add-ons menu.
Add-hoc database queries
You could write a controller which allows you to execute arbitrary sql and view the results, but this isn't something I'd recommend. As suggest admin_data is a good solution for managing your data, but if you want to do anything more complicated you'll have to resort to writing the code yourself.

Resources