Database Changes Outside Ruby/Rails Migration - ruby-on-rails

we have several technologies accessing the same database. At the moment, Ruby/Rails is used to create migrations when making changes to the database. The question is a simple one:
Is it possible for our DBAs to make changes to the database (not using Ruby migrations) without stepping on the Ruby devs toes and breaking the Ruby web application?
If so, some generic details about how to get started or pointed in the right direction would be great! Thanks.

I can tell you from experience that this is not the best idea, one that you will eventually regret and later, inevitably, reverse. But I know that it does come up. I've had to do them (against my will or in case of extreme emergencies).
Given the option, I'd push back on it if you can in favor of any solution that bring the SQL closer to the repository and further away from a "quick fix" to the database directly. Why?
1) Your local/testing/staging/production databases will diverge, eventually rendering your code untestable in a reliable way
2) You won't be able to regenerate your database from "scratch" to match production
3) If the database is ever clobbered, you won't be able to re-create it in any sensible way.
DBA's generally don't care about these things until something in the code breaks, and they want you to figure it out. But, for obvious reasons, that now becomes quite difficult.
One approach I have taken that seems to make everyone happy is to do the following:
1) Commit to having ALL database changes, big or small, put into a repository with the code. This means that everything that has happened to the database is all together in one place.
2) Each change, or set of changes, should be a migration. A migration can be simply running an SQL file. But, it should be run from within a migration for all the testability benefits.
So, for example, let's say you have a folder structure like:
- database_updates
-- v1
--- change_1.sql
--- change_2.sql
-- v2
--- change_3.sql
--- change_2_fix.sql
Now, let say you want to make a change or set of change via SQL. First, create a new version folder, let's call it "v1". Next, put your SQL scripts in this folder. Finally, create a migration:
def change
# Read all files in v1 folder, and run the SQL
end
(I have code that does this, happy to share the gist if you find yourself using this approach)
Since each migration is transactional, any of the scripts that fail will cause all of them to fail.
Now, let's say you have the next set, v2. Same exact thing. And, we have a history of these "versioned" changes, so we can look at the migration history and see what's been run, etc.
As a power user note, this set up also allows for recourse if things fail; in these cases, we can opt to go back to v1:
def up
# run v2 scripts
end
def down
# run v1 scripts
end
For this to work, v1 and v2 would need to be autonomous -- that is, they can destroy and rebuild entities without any dependencies. If that's not what you want, just stick with the change method.
This would also allow you to test for breaking changes. Let's say it is reported that something doesn't work anymore with v6. You can rollback your database migrations to v5, v4, etc (because you are doing a migration per folder) and test to see when the test broke, and correct it with v7.
Anyway, the end game of it all is that you can safely check out this project from a repository, create your database, run rake db:migrate and know that your database structure resembles exactly what is deployed elsewhere. And, worst case, if your database gets clobbered, you can just run all your scripts from v1 - vN and end up with your database back again.
For DBA's everything remains SQL for them, they can just send you a file or set of files for you to run.
If you want to get fancy, you could even write a migration generator that knows how to handle a line like rails g migration UpdateDBVersion version:v7 to take care of the repetitive boilerplate.

As long as everyone relies on the same updated schema.rb or structure.sql, everyone will share the same database 'version'.
See this SO answer for more insight.

Changes to the database, tables, or indexes should be made using ActiveRecord migrations whenever possible. This specifically ensures that development and test environments remain logically in sync. Remember that developers must be capable of accurate development and testing against the same database structure as occurs in the production environment, and QA teams must be able to adequately test such changes.
However, some database features are not actually supported by ActiveRecord migrations, and may only be applied directly to the database. These features are often database-specific, such as any of the following:
Views
Triggers
Stored procedures
Indexes with function-based columns
Virtual columns
Essentially any database-specific features that don't have an ActiveRecord abstraction will be made directly to the database.
Sometimes, however, other applications require the addition of tables, columns, or indexes in order to operate properly or efficiently. These other applications may simply be used to view/report against the database, or they may be substantial business applications that have their own independent database requirements and separate development teams. Occasionally, a DBA may have to step in and create an index or provide some optimization needed to solve a real-world production performance issue.
There are simply far too many situations for shared database management to give a definitive answer. Depending on the size of the organization and the complexity of the needs for the shared management, there may be many ways to solve the problem of a shared database schema that are specific to the application or organization.
For instance, I have worked on applications that shared a database with as many as 10 other applications, each of which "owned" portions of the schema and shared other portions with the other teams, all mediated through the DBA group. In situations such as this, the organizational structure and change control process may be the only means of solving this problem.
Whichever the situation, some real-world suggestions may help avoid problems and mitigate maintenance woes:
Offer to translate SQL DDL commands into ActiveRecord migrations, where possible, so that DBAs can accomplish their needs, and the application team can still appropriately maintain the schema
Any changes made outside the ActiveRecord migration should be thoroughly tested for impact to the project in a non-production environment by the same QA resources that test the actual Rails application
Encapsulate any external changes in a .sql file and include the file as part of the project in version control
If the development team is using the same database product in development (some cannot, due to licensing or complexity), those changes should be applied to the developer database instances, as well
It's best if you can apply the changes during a migration, even just by calling the relevant CLI tools as a migration step - the exact mechanism will be database-dependent, as well
Try to avoid doing this more than is absolutely necessary, as this can significantly reduce the database independence of the application, even between versions of the same database product (limiting upgrade opportunities)

Related

How do I easily restructure my Rails' DB not yet in production?

Problem: I'm creating a Rails app, single dev, running staged/prod servers on Heroku, not publicly released yet. Reworking my DB infrastructure, since I've done several migrations since creating tables. I know it's somewhat trivial, but I'm trying to get things cleaned up before initial launch:
Redo indexes.
Reorder/rename fields. I would prefer avoiding tables with timestamp fields randomly sandwiched in the middle and PostgreSQL doesn't allow simple field reordering (for this reason, I may standardize timestamps as the first fields moving forward, so future migrations aren't so noticeable).
Possible Solution(s): I'll need to drop my schema and reload a clean copy of it. I can:
Edit schema.rb structure for existing tables to my liking.
(?) Manually edit the [VERSION] timestamp in schema.rb.
(?) Edit latest migration file, duplicate schema.rb.
Run rails db:schema:load-esque (likely with additional db:reset-esque steps to drop the existing schema/structure first).
Delete older migration files.
Question #1: See 2.-3. Aside from the elephant in the room that this method isn't generally recommended long-term, when does rails db:schema:dump have a use case?, since it's essentially what I'm doing by hand? I don't believe it would generate models tables not generated through Rails beforehand, so that could get messy (without running rails generate model --skip-migration). Does it create a new migration, or at minimum does it update the schema.rb timestamp so as not to look backwards at prior migrations? Otherwise, I would think :dump would be unconventional to Rails' own system.
Question #2: I know it will break staged/production servers once I push the changes (again, I'll have to run step 5. on them or just replace my Heroku apps with fresh copies). However, would this method also break these too, and/or break future Rails migration steps? I'd rather make sure whatever I build can be launched cleanly without requiring additional steps by hand that I could have avoided.
As someone who as ditched old, embarrassing migrations for a flat, updated schema part way through an app development process, I only ever regretted it.
Migrations only run on deployment, so there's no real speed reason to squash them, combine them, rename them, or remove them (unless you are totally ditching whatever is in that migration).
AND, ALL your migrations only run once on your first deploy. From then on, only future migrations will be run. So the overhead is a one-time thing.
Having a few (hundred) migration files is really no big deal.
"But what about the fact that migration #3 adds a column which is later removed by migration #45?" That's how we write software: over time, with changes. It's fine.
If you need to redo indexes and rename fields, write another migration. Call it "CleanupBeforeProductionDeploy" if it'll give you that wonderful sense of cleaned up code.
Reordering columns is useless. This isn't Excel. Don't bother.
If you need to display fields in a certain order, use SQL or .map or .pluck or I'm sure a dozen other Ruby or RoR solutions, that's what they are for.
You've done good work, it sounds like your app is nearly ready to deploy. Congratulations, that's a serious milestone. So many of us start scratching something out and never push it across the finish line. Seriously, you should feel good.
Don't procrastinate with meaningless fussing that will just lead to errors.
Go push the code and be happy.
Rant over. If I haven't convinced you, here are some tips to at least keep you safe / sane.
The purpose of the schema is a shortcut for creating a new dB from scratch and just jumping to the end without running every migration.
The schema dump is used, for example, by Heroku when you are creating a backup of your production database.
(Just an aside, I use parity to get production data from my apps into my development environments so I can work with "real" data).
You could also use a schema dump from the actual database to create a new schema file if something happened to your schema file.
So, you can do this, even though you shouldn't waste your time.
Here would be my order of operations:
Keep writing migrations for all these changes you want to make. Yes, write one or more migrations to make all your changes. Seriously. Let Rails handle versioning and timestamping your schema file as you go. And, this way you can do this in stages and test things out.
For changing your table column orders I'd do the following, it's messy, but it would work:
rename_table :users, :users_disorganized
create_table :users do |t|
...
end
# write some complicated SQL to copy 'users_disorganized' data into 'users'
# safety catch in case copying things over didn't work
drop_table :users_disorganized unless User.all.size.zero?
Use SQL to map and copy the contents of users_disorganized into users.
Here's a good SO post on the topic of SQL inside migrations
SQL is really your only choice here since you don't have an ApplicationRecord Model for UserDisorganized.
Could you make a UserDisorganized model just to make copying the files over easier? Yes, then you'd have to remember delete that file after your production deploy.
Starting to see how time-intensive this is going to be?
Now repeat this process for every table where you want to reorder columns.
Once you are all done writing migrations, your pristine schema now has the latest timestamp and version and everything so don't mess with these values.
(you can change them, it'll probably be fine, but if you decide to set them too far in the future... then forget about doing that... then write a small migration just to patch a bug or add a feature... then try to run that migration... then spend 4 hours trying to figure out why nothing happens when you run rake db:migrate... all just to realize the schema timestamp is later than your migration timestamp... and so on)
Now you can gulp delete all your migrations. Yeah. You don't want them? Here's your chance to prove it. You still serious about this?
How do you then initialize your production database? Tell Heroku to run rake db:schema:load instead of rake db:migrate.
Good luck.
(and don't bother)

Making Changes Between Two Databases ASP.NET MVC

I'm working on a project in which we have two versions of an MVC App, the live, and the dev versions, I've made changes to the dev version and added tables and data, etc.
Is there any way to migrate these changes onto the live version without losing all data(i.e. just regenerating the database).
I've already tried just rebuilding the database but we lose all data that was previously stored( as obviously we are essentially deleting the old database and rebuilding it).
tl;dr
How do I migrate my dev version of an mvc app along with any new tables to the live version of an mvc app with missing models and tables.
Yes, it is possible to migrate your changes from your dev instance to your production instance; to do so you must create SQL scripts that update your production database with the changes. This may be accomplished by manually writing the scripts or by using tools to generate the scripts for you. However you go about it, you will need scripts to update your database (well, you could perform manual updates via the tooling of your database, but this is not desirable, as you want the updates to occur in a short time window, and you want this to happen reliably and repeatably).
The easiest way that I know of to do this is to use tools like SQL Compare (for schema updates) or SQL Data Compare (for data updates). These are from Redgate, but they cost a fair bit of money. They are well worth they price, and most companies I've worked with are happy to pay for licenses. However, you may not want to shell out for them personally. To use these tools, you connect them to source and destination databases, and they analyze the differences between the databases (schematically or data) and produce SQL scripts. These scripts may then be run manually or by the tools themselves.
Ideally, when you work on your application, you should be producing these scripts as you go along. That way when it comes time to update your environments, you may simply run the scripts you have. It is worth taking the time to include this in your build process, so database scripts get included in your builds. You should write your scripts so they are idempotent, meaning that you can run them multiple times and the end result will be the same (the database updated to the desired schema and data).
One may of managing this is creating a DBVersions table in your database. This table includes all your script updates. For example you could have a table like the following (this is SQL Server 2008 dialect):
CREATE TABLE [dbo].[DBVersions] (
[CaseID] [int] NOT NULL,
[DateExecutedOn] [datetime] NOT NULL,
CONSTRAINT [PK_DBVersions] PRIMARY KEY CLUSTERED (
[CaseID] ASC
)
) ON [PRIMARY]
CaseID refers to the case (or issue) number of the feature or bug that requires the SQL update. Your build process can check this table to see if the script has already been run. If not, it runs it. This is useful if you cannot write your scripts in a way that allows them to be run more than once. If all your scripts can be run an unbounded number of times, then this table is not strictly necessary, though it could still be useful to reduce the need to run a large number of script every time a deployment is done.
Here are links to the Redgate tools. There may be many other tools out there, but I've had a very good experience with these.
http://www.red-gate.com/products/sql-development/sql-compare/
http://www.red-gate.com/products/sql-development/sql-data-compare/
It depends on your deployment strategy and this is more of a workflow that your team need to embrace. If regenerating the live database from scratch it can take awhile depending how big the database size is. I don't see a need to do this in most scenarios.
You would only need to separate out database schema object and data row scripts. The live database version should have its database schema objects scripted out and stored in a repository. When a developer is working on a new functionality, he/she will need to make those changes against the database scripts in the repository. If there is a need to make changes to the database rows then the developer will also need to check in the data row scripts in the repository. On a daily deployment the live database version can be compared against what is checked in the repository and pushed to make it in sync.
On our side we use tools such as RedGate Schema Compare and Data Compare to do the database migration from the dev version to our intended target version.

Is it safe to run migrations on a live database?

I have a simple rails-backed app running 2-3 million pageviews a day off a Heroku Ronin database. The load on the database is pretty light, though, and it could handle a lot more than we're throwing at it.
Is it safe for me to run a migration to add tables to this database without going into maintenance mode? Also, would it be safe to run a migration to add a few columns to the core table responsible for almost all of the reads and writes?
Downtime is not acceptable, even for a few minutes.
If running migrations live isn't advisable, what I'll probably do is set up a new database, run the migrations on that, write a script to sync the two databases, and then point the app at the new one.
But I'd rather avoid that if possible. :)
Sounds like your migration includes:
adding new tables (perhaps indexes? If so, that could take a bit longer than you might expect)
adding new columns (default values and/or nullable?)
wrapping your changes in a transaction (?)
Suggest you gauge the impact that your changes will have on your Prod environment by:
taking a backup of Prod (with all the Prod data within)
running your change scripts against that. Time each operation
Balance the 2 points above against the typical read & write load at the time you're expecting to run this (02:00, right?).
Consider a 'soft' downtime by disabling (somehow) write operations to the tables being effected.
Overall (or in general), adding n tables and new nullable columns to an existing table would/could likely be done without any downtime or performance impact.
Always measure the impact your changes will have on a copy of Prod. Measure 'responsiveness' at the time you apply your changes to this copy. Of course this means deploying another copy of your Prod app as well, but the effort would be worthwhile.
Assuming it's a pg database (which it should be for Heroku).
http://www.postgresql.org/docs/current/static/explicit-locking.html
alter table will acquire an access exclusive lock. So, the table will be locked.
On top of this, you will be required to restart the Rails application in order for it to be aware of any new models. If you are going to be adding tables to the application or modifying model code in any way.
As for pointing to a new app with a freshly modified database, how are you going to do the sync of the data and also sync the changes in data between the two databases in the time that the sync takes?
Adding tables shouldn't be a concern, as your application won't be aware of them until proper upgrades are done. As for adding columns to a core table, I'm not so sure. If you really need to prevent downtime, perhaps it's better to add a secondary table that (linked by an ID with the core table) adds your extra columns.
Just my two cents.

How to architect Rails site that can be edited while running?

I am writing a Rails app that "scrapes/navigates" some other websites and webservices for content. I am using Mechanize and Savon to do the heavylifting.
But given the dynamic nature of the web, I'd like to make my calls to these editable by the admin users of the site - rather than requiring me to release a new version of the site.
The actual scraping thread happens async to the website, using the daemons gem.
My requirements are:
Thinking that the scraping/webservice calling code is quite simple, the easiest route is to make the whole class editable by the admins.
Keep a history of the scraping code - so that we can fairly easily revert if we introduce a problem.
Initially use the code from the file system, but as soon as thats been edited and stored somewhere, to use that code instead.
I am thinking my options are:
Store the code in the db (with a history table for the old versions)
Store the code in a private git repo somewhere and access that for the history/latest versions.
I am thinking the git route might be easiest, given its raison d'etre is to track file history...
But perhaps there is a gem/plugin that does all this for me, out of the box?
Thanks in advance for any tips/advice.
~chris
I really hope you aren't doing something like what's talked about here...
Assuming you are doing a proper mixin, there used to be a gem called "acts_as_versioned" which would do something like you want. It's been a while so I don't know if it's been turned into a plugin or if it's been abandoned. Essentially the process it uses was to provide a combination key for your versioned table.
Your database would have a structure like this:
Key column (id for the record)
Version column (id for the record's version)
All the record attributes
Let's say you had a table for your scripts, and the script you wanted has three versions. Your table would have the following records:
123, 3, '#Be good now'
123, 2, 'puts "Hi"'
123, 1, '#Do not be bad'
Getting the most recent version would be as simple as
Scripts.find :first, :conditions=>{:id=>123}, :order=>"version desc"
Rolling back would be as simple as removing the most recent version, or having another table with a pointer to the active version. It's up to you.
You are correct in that git, subversion, mercurial and company are going to be much better at this. To provide support, you just follow these steps:
Check out the script on the server (using a tag so you can manage what goes there at any time)
Set up a cron job to check out the new script periodically (like every six hours or whatever you feel comfortable with)
The daemon you have for running the script should run the new version automatically.
IF your site is already under source control, and IF you're running under mod_rails/passenger, you could follow this procedure:
edit scraping code
commit change locally
touch yourapp/tmp/restart.txt
that should give you history of the change and you shouldn't have to re-deploy.
A bit safer, but not sure if it's possible for you is on a test/developement server: make change, commit locally, test it, then on production server, git pull then touch tmp/restart.txt
I've written some big spiders and page analyzers in the past, and one of the things to keep in mind is what code is providing what service to the entire application.
Rails is providing the presentation of the data being gathered by your spidering engine. The presentation is one side of the coin, and spidering is the other, and they should be two separate code bases, tied together by some data-sharing mechanism, which, in your case, is the database. The database gives you some huge advantages as does having Rails available, when your spidering code is separate. It sounds like you have some separation already, but I'd recommend creating a wider gap. With that in mind, here's how I've done it before, and what I'd do now.
Previously, I had a separate app for my spidering that was spawning multiple spider tasks. Each task would look at a bunch of different URLs, throw their results in the database, then quit. Each time one quit the main app would spawn another spider to process more URLs. Each loop, the main app checked a YAML configuration file for run-time parameters, like how many sub-tasks it should have running, how many URLs they'd get, how long they'd wait for connections, etc. It stored the last modification date of the config file each time it loaded it so, if I made a change to the file, the app would sense it in a reasonably short time, reread the file, and adjust its behavior.
All state information about the URLs/pages/sites being scraped/spidered, was kept in the database so I could check on its progress. I could see how many had been processed or remained in the queue, the various result codes, and the content being returned. If I didn't like something I could even tweak the filters to skip junk pages, knowing the spidering tasks would be updated in a few minutes.
That system worked extremely well, spidered a major customer's series of websites without a glitch, running for several weeks as I added new sites to the list. (We were helping one of the Fortune 50 companies improve their sites, and every site had been designed and implemented by a different team, making every site completely different. My code had to be flexible and robust; I was really happy with how it worked out.)
To change it, these days I'd use a database table to hold all the configuration info. That way I could easily build an admin form, and let someone else inherit the task of adjusting the app's runtime configuration. The spider tasks would also be written so they'd pull their configuration from the database, rather than inherit it from the main app. I originally had the main app do all the administration and pass the config info to the spidering apps because I wanted to keep the number of connections to the database as low as possible. I was using Postgres and now know it could have easily handled the load, so by letting the individual tasks handle their configuration I could have made it more responsive.
By making the spidering engine separate from the presentation engine it was possible to temporarily stop one or the other without affecting the progress of the spidering job. Once I had the auto-reload of the prefs in place I don't think I had to stop the spidering engine, I just adjusted its prefs. It literally ran for weeks without stopping and we eventually pulled the plug because we had enough data for our needs.
So, I'd recommend tweaking your code so your spidering engine doesn't rely on Rails, instead it will be fired off by cron or a separate scheduling app. If you have to temporarily stop Rails your engine will run anyway. If you have to temporarily stop the engine then Rails can continue serving pages. The database sits between the two acting as the glue.
Of course, if the database goes down you're hosed all the way around, but what else is new? :-)
EDIT: Chris said:
"I see your point about the splitting the code out, though my Ruby-fu is low - not sure how far I can separate things without having to have copies of the ActiveModel/migrations stuff, plus some shared model classes."
If we look at your application as spider engine <--> | <-- database --> | <--> Rails/MVC/presentation, where the engine and Rails separately read and write to the database, and look at what each does well, that helps figure out how to break them into separate code bases.
Rails is designed to handle migrations, so let it. There's no reason to reinvent that wheel. But, how often do you do migrations, and what is effected when you do? You do them seldom once the application is stable, and, at that point you'd do them in a maintenance cycle to tweak the database. You can shut down the spidering engine and the web interface for a few minutes, migrate the database, then bring things up and you're off and running. Migrations are a necessary evil, but are hardly show-stoppers once in production. Most enterprises have "Software Sunday", or some pre-announced window of maintenance, so do the same.
ActiveRecord, modeling and associations are pretty easy to deal with too. The models are in a file that is required internally by Rails already, so the spidering engine can inherit the database know-how that way too; Multiple apps/scripts can use the same model file. You don't see the Rails books talk about it much, but ActiveRecord is actually pretty easy to use outside of Rails. Search the googles for activerecord without rails for more info.
You can pull in ActiveSupport also if you want some of its extensions to classes by doing a regular require, but the Rails "view" and "controller" logic, which normally applies to presenting the web interface, shouldn't be needed at all in the engine.
Business logic, which goes in the controllers in Rails could even be refactored into separate methods that get required by the Rails side of things and by the spidering engine. It's a different way of looking at Rails but falls in line with the "DRY" mantra - don't repeat yourself, so make things modular and require (or require_relative) bits and pieces that are the building blocks of the entire system.
If you don't want a totally separate codebase, you can take advantage of Rail's script runner, which gives a script access to the ActiveRecord::Base and ActiveRecord::Associations and ActiveSupport. Do a rails runner -h from your app's main directory, or search for "rails runner" for more info. runner is not good for a job that starts and runs many times an hour, because Rail's startup cost is high. But, if you have a long-running task, say one that runs in parallel with your rails app, then it's a great choice. I'd give it serious consideration for the spidering side of your application. Eventually you might want to break the spidering-engine out to a separate host so the presentation side has a dedicated host, so runner will help you buy time and do it in small steps.

how to "test" an externally managed db in rails

Background: We have a dependency on an externally managed database. This is a company-wide resource. We have a read-only account into it and have no control over or input into the schema or contents.
Issue: We're using ActiveRecord as our ORM into said resource; we manage the connection information separate from our central db connection information. It worked out fine. We have some characterization tests that verify that our ActiveRecords retrieve the data for a few know datapoints. However, we have no test/dev environment replacement strategy for this database. Right now all of our environments are configured to use the production database connection:
That sucks
We don't want the production password on the build server, so our build is broken
The queries to the production database server are slow and because caching is off in test/dev our homepage loads REALLY slow locally
So we need something else in test/dev mode.
Q) Why not just have another sqlite database locally that mimics the schema of the production database?
A) Because we've tried that for another connection and it's lousy for at least a couple reasons.
It's fairly complex managing the separate schema (sqlite db file) in the rake process just for testing/dev.
Testing ActiveRecords outside of a schema that's managed by some process that ensures schema consistency between environments is largely meaningless.
The database configuration doesn't feel like the right seam. The database connection aspect of this, and thus the AR, is not part of what we're developing, it's just a connection library in this case. As long as we can ensure our test/dev replacement for it acts the same as the production AR, then it doesn't matter if we use AR for this in test/dev. I hope that made sense, it's an important point.
Q) You could use SchemaDumper to grab the schema of the production database and use it to generate the test database. That way all the SQLy details would be automated and it would look more like typical rails stuff.
A) Yeah, that would be pretty hot, but SchemaDumper doesn't seem to play nicely with the production database connection. It just hangs after a while and we don't get the whole schema. Bummer. That also doesn't avoid having to manage that whole other database file and work managing said file into our rake tasks.
What I really want to do is to have production use the ARs that are tested in the characterization tests and then have another object that's a plain old PORO that reads stuff out of a yaml file (like a fixture) that replaces the object in the test/development/build environments.
Q) But Najati, isn't putting that stuff in a yaml file the same as defining the schema?
A) Well, yeah, sorta. Its just a lot more direct and easier to manage if it's in some PORO that loads some crap out of a yamlfile than if I also have to work some half-baked schema management into our build tasks; we do this currently and it's pretty lousy and, frankly, doesn't seem to be buying us much. Also the test schema and the test data fixture duplication the information: "this is what we want the test version of this data to look like" - why do we need both? I claim "So that you can use the same AR in both environments." is not a sufficient argument to justify the complexity of managing the extra sqlite db file.
Q) I feel like there's something you're not telling me.
A) I've been cheating on my Weight Watchers. Also,
In the past when I've had something like this my solution looked like this:
Characterization tests that capture the important aspects of the external service's behavior, run not with the unit test suite, but as a separate process on the build server, maybe once every 4 hours or every night or whatever.
A fake implementation that used the same set of tests to exercise it's behavior to ensure that it was providing similar functionality to the test/dev environment.
Spring (and probably dependency injection containers in general) makes this easy. You just swap out beans in your environment-specific bean config and the test env just goes on it's merry way.
Given my understanding/knowledge, Rails doesn't seem to be lending itself to this very well. I supposed I could redefine the class in my test/dev environment scripts, but that seems really shady. For one thing, I don't know if that would keep the model from being loaded at application start-up, and another, that would add yet another strange wrinkle to our Rails project, another bit of magic that would make the project harder to come up to speed on. I want something that feels like the "service replacement" strategy used in Spring that doesn't require hard-to-find/understand RoR magic.
Uhh. I'll stop there and see if that much prompts anything. Thanks for taking the time to read!
You don't actually test the database. You're testing your models that interact with the app or other 'original' code that might touch the database. If there is magic in the prod database, take it out and put it in fixtures or factories. The fixtures and factories load the test data into a test instance, for example: db_test. When the test has passed or failed the database is rolled back with transactions and your tests can (and should) run atomically. If you are trying to build an app that tests a database, that's a different story. For everyone else, use the testing design that Rails provides: fixtures or factories and a "test" rails database defined in config/database.yml. The YML file swaps out for the dependency injection functionality. It's just a hash of variables, you don't need any pojo spring tricks to swap out environments. :) When rails runs your tests with fixtures or factories, it will load only the test environment as defined in database.yml. This will also integrate nicely with rspec, guard and other tools. When I save one of my models, it creates some data in my test db, runs my test and cleans up the database all just by hitting the save button on my source file.
Integration tests should still use this same mechanism. The only thing that makes this process annoying is legacy databases and I've worked some magic there with metaprogramming to minimize the hassle.
Take a look at factory_girl for factories. And episode railscast #275: http://railscasts.com/episodes/275-how-i-test
I think you have only two options: either you duplicate the database in some way, or you separate the ORM into a thin (as thin as possible) layer and mock it out in your tests.
Besides, you may have an AR schema ready in your db/schema.rb.

Resources