how to "test" an externally managed db in rails - ruby-on-rails

Background: We have a dependency on an externally managed database. This is a company-wide resource. We have a read-only account into it and have no control over or input into the schema or contents.
Issue: We're using ActiveRecord as our ORM into said resource; we manage the connection information separate from our central db connection information. It worked out fine. We have some characterization tests that verify that our ActiveRecords retrieve the data for a few know datapoints. However, we have no test/dev environment replacement strategy for this database. Right now all of our environments are configured to use the production database connection:
That sucks
We don't want the production password on the build server, so our build is broken
The queries to the production database server are slow and because caching is off in test/dev our homepage loads REALLY slow locally
So we need something else in test/dev mode.
Q) Why not just have another sqlite database locally that mimics the schema of the production database?
A) Because we've tried that for another connection and it's lousy for at least a couple reasons.
It's fairly complex managing the separate schema (sqlite db file) in the rake process just for testing/dev.
Testing ActiveRecords outside of a schema that's managed by some process that ensures schema consistency between environments is largely meaningless.
The database configuration doesn't feel like the right seam. The database connection aspect of this, and thus the AR, is not part of what we're developing, it's just a connection library in this case. As long as we can ensure our test/dev replacement for it acts the same as the production AR, then it doesn't matter if we use AR for this in test/dev. I hope that made sense, it's an important point.
Q) You could use SchemaDumper to grab the schema of the production database and use it to generate the test database. That way all the SQLy details would be automated and it would look more like typical rails stuff.
A) Yeah, that would be pretty hot, but SchemaDumper doesn't seem to play nicely with the production database connection. It just hangs after a while and we don't get the whole schema. Bummer. That also doesn't avoid having to manage that whole other database file and work managing said file into our rake tasks.
What I really want to do is to have production use the ARs that are tested in the characterization tests and then have another object that's a plain old PORO that reads stuff out of a yaml file (like a fixture) that replaces the object in the test/development/build environments.
Q) But Najati, isn't putting that stuff in a yaml file the same as defining the schema?
A) Well, yeah, sorta. Its just a lot more direct and easier to manage if it's in some PORO that loads some crap out of a yamlfile than if I also have to work some half-baked schema management into our build tasks; we do this currently and it's pretty lousy and, frankly, doesn't seem to be buying us much. Also the test schema and the test data fixture duplication the information: "this is what we want the test version of this data to look like" - why do we need both? I claim "So that you can use the same AR in both environments." is not a sufficient argument to justify the complexity of managing the extra sqlite db file.
Q) I feel like there's something you're not telling me.
A) I've been cheating on my Weight Watchers. Also,
In the past when I've had something like this my solution looked like this:
Characterization tests that capture the important aspects of the external service's behavior, run not with the unit test suite, but as a separate process on the build server, maybe once every 4 hours or every night or whatever.
A fake implementation that used the same set of tests to exercise it's behavior to ensure that it was providing similar functionality to the test/dev environment.
Spring (and probably dependency injection containers in general) makes this easy. You just swap out beans in your environment-specific bean config and the test env just goes on it's merry way.
Given my understanding/knowledge, Rails doesn't seem to be lending itself to this very well. I supposed I could redefine the class in my test/dev environment scripts, but that seems really shady. For one thing, I don't know if that would keep the model from being loaded at application start-up, and another, that would add yet another strange wrinkle to our Rails project, another bit of magic that would make the project harder to come up to speed on. I want something that feels like the "service replacement" strategy used in Spring that doesn't require hard-to-find/understand RoR magic.
Uhh. I'll stop there and see if that much prompts anything. Thanks for taking the time to read!

You don't actually test the database. You're testing your models that interact with the app or other 'original' code that might touch the database. If there is magic in the prod database, take it out and put it in fixtures or factories. The fixtures and factories load the test data into a test instance, for example: db_test. When the test has passed or failed the database is rolled back with transactions and your tests can (and should) run atomically. If you are trying to build an app that tests a database, that's a different story. For everyone else, use the testing design that Rails provides: fixtures or factories and a "test" rails database defined in config/database.yml. The YML file swaps out for the dependency injection functionality. It's just a hash of variables, you don't need any pojo spring tricks to swap out environments. :) When rails runs your tests with fixtures or factories, it will load only the test environment as defined in database.yml. This will also integrate nicely with rspec, guard and other tools. When I save one of my models, it creates some data in my test db, runs my test and cleans up the database all just by hitting the save button on my source file.
Integration tests should still use this same mechanism. The only thing that makes this process annoying is legacy databases and I've worked some magic there with metaprogramming to minimize the hassle.
Take a look at factory_girl for factories. And episode railscast #275: http://railscasts.com/episodes/275-how-i-test

I think you have only two options: either you duplicate the database in some way, or you separate the ORM into a thin (as thin as possible) layer and mock it out in your tests.
Besides, you may have an AR schema ready in your db/schema.rb.

Related

Fixing (or adding features to) a Rails server in production

I would like to know what the best practice is for taking a ruby on rails application in production and adding a feature to it or debugging a broken feature?
What I mean is, say you have a working application and you have lots of people using it. You want to add a new feature to this app. You clone your application to your local machine. Create a new feature (or w/e) branch.
Now what do you change/do so you don't destroy your database and so you are able to test and debug this application on your local machine?
Also, let's say this is an older rails application with an older ruby version.
I would also like to note that I am having trouble finding any information this and am willing to read books and lots of text to learn if it is a very involved task.
Although the complexity of this type of operation varies quite a bit, usually based on the complexity of the application itself, I think a few generalizations can be made.
Tests
Obviously, do not break any existing tests. Write tests for you new functionality, even if they are the first tests in the application.
Data
Ideally, you will have data to work with that very closely mirrors your production data. In some cases (CMS) this may be an actual dump of the production database and assets, restored locally. In other cases (billing portal for a hospital), you will probably need to rely on well-constructed seed data. Once your automated tests pass, you can perform manual QA against the (possibly simulated) production data.
Staging
If you do not have a staging environment that 100% mirrors your production environment, set one up now. This should be set up as close as you possibly can to your production environment, using the database guidelines from above. Merge your feature branch into staging prior to merging it into production. This will allow you to do a final QA test in a near-production environment. This can be used to test not only new application features, but new server versions, ruby version, etc.
CI/CD
It is becoming very common to use CI/CD to automate the testing and deployment of feature branches. This can help enforce code quality guidelines. It can also allow you to run the tests in an environment that matches production, for extra peace of mind.
Backups
Obviously, even with all of this, things can still go wrong. Keeping up-to-date backups is vital, for worst case scenarios.

Database Changes Outside Ruby/Rails Migration

we have several technologies accessing the same database. At the moment, Ruby/Rails is used to create migrations when making changes to the database. The question is a simple one:
Is it possible for our DBAs to make changes to the database (not using Ruby migrations) without stepping on the Ruby devs toes and breaking the Ruby web application?
If so, some generic details about how to get started or pointed in the right direction would be great! Thanks.
I can tell you from experience that this is not the best idea, one that you will eventually regret and later, inevitably, reverse. But I know that it does come up. I've had to do them (against my will or in case of extreme emergencies).
Given the option, I'd push back on it if you can in favor of any solution that bring the SQL closer to the repository and further away from a "quick fix" to the database directly. Why?
1) Your local/testing/staging/production databases will diverge, eventually rendering your code untestable in a reliable way
2) You won't be able to regenerate your database from "scratch" to match production
3) If the database is ever clobbered, you won't be able to re-create it in any sensible way.
DBA's generally don't care about these things until something in the code breaks, and they want you to figure it out. But, for obvious reasons, that now becomes quite difficult.
One approach I have taken that seems to make everyone happy is to do the following:
1) Commit to having ALL database changes, big or small, put into a repository with the code. This means that everything that has happened to the database is all together in one place.
2) Each change, or set of changes, should be a migration. A migration can be simply running an SQL file. But, it should be run from within a migration for all the testability benefits.
So, for example, let's say you have a folder structure like:
- database_updates
-- v1
--- change_1.sql
--- change_2.sql
-- v2
--- change_3.sql
--- change_2_fix.sql
Now, let say you want to make a change or set of change via SQL. First, create a new version folder, let's call it "v1". Next, put your SQL scripts in this folder. Finally, create a migration:
def change
# Read all files in v1 folder, and run the SQL
end
(I have code that does this, happy to share the gist if you find yourself using this approach)
Since each migration is transactional, any of the scripts that fail will cause all of them to fail.
Now, let's say you have the next set, v2. Same exact thing. And, we have a history of these "versioned" changes, so we can look at the migration history and see what's been run, etc.
As a power user note, this set up also allows for recourse if things fail; in these cases, we can opt to go back to v1:
def up
# run v2 scripts
end
def down
# run v1 scripts
end
For this to work, v1 and v2 would need to be autonomous -- that is, they can destroy and rebuild entities without any dependencies. If that's not what you want, just stick with the change method.
This would also allow you to test for breaking changes. Let's say it is reported that something doesn't work anymore with v6. You can rollback your database migrations to v5, v4, etc (because you are doing a migration per folder) and test to see when the test broke, and correct it with v7.
Anyway, the end game of it all is that you can safely check out this project from a repository, create your database, run rake db:migrate and know that your database structure resembles exactly what is deployed elsewhere. And, worst case, if your database gets clobbered, you can just run all your scripts from v1 - vN and end up with your database back again.
For DBA's everything remains SQL for them, they can just send you a file or set of files for you to run.
If you want to get fancy, you could even write a migration generator that knows how to handle a line like rails g migration UpdateDBVersion version:v7 to take care of the repetitive boilerplate.
As long as everyone relies on the same updated schema.rb or structure.sql, everyone will share the same database 'version'.
See this SO answer for more insight.
Changes to the database, tables, or indexes should be made using ActiveRecord migrations whenever possible. This specifically ensures that development and test environments remain logically in sync. Remember that developers must be capable of accurate development and testing against the same database structure as occurs in the production environment, and QA teams must be able to adequately test such changes.
However, some database features are not actually supported by ActiveRecord migrations, and may only be applied directly to the database. These features are often database-specific, such as any of the following:
Views
Triggers
Stored procedures
Indexes with function-based columns
Virtual columns
Essentially any database-specific features that don't have an ActiveRecord abstraction will be made directly to the database.
Sometimes, however, other applications require the addition of tables, columns, or indexes in order to operate properly or efficiently. These other applications may simply be used to view/report against the database, or they may be substantial business applications that have their own independent database requirements and separate development teams. Occasionally, a DBA may have to step in and create an index or provide some optimization needed to solve a real-world production performance issue.
There are simply far too many situations for shared database management to give a definitive answer. Depending on the size of the organization and the complexity of the needs for the shared management, there may be many ways to solve the problem of a shared database schema that are specific to the application or organization.
For instance, I have worked on applications that shared a database with as many as 10 other applications, each of which "owned" portions of the schema and shared other portions with the other teams, all mediated through the DBA group. In situations such as this, the organizational structure and change control process may be the only means of solving this problem.
Whichever the situation, some real-world suggestions may help avoid problems and mitigate maintenance woes:
Offer to translate SQL DDL commands into ActiveRecord migrations, where possible, so that DBAs can accomplish their needs, and the application team can still appropriately maintain the schema
Any changes made outside the ActiveRecord migration should be thoroughly tested for impact to the project in a non-production environment by the same QA resources that test the actual Rails application
Encapsulate any external changes in a .sql file and include the file as part of the project in version control
If the development team is using the same database product in development (some cannot, due to licensing or complexity), those changes should be applied to the developer database instances, as well
It's best if you can apply the changes during a migration, even just by calling the relevant CLI tools as a migration step - the exact mechanism will be database-dependent, as well
Try to avoid doing this more than is absolutely necessary, as this can significantly reduce the database independence of the application, even between versions of the same database product (limiting upgrade opportunities)

How to use development database for only one model in Rspec/Cucumber?

I have a dictionary application which has a lot of word definitions in the development database. When I'm writing my Cucumber/Rspec tests I usually populate the test db with a few words that I know I'm gonna be using in the test. However, it would be great if I could access the development db for only one model (Word) to check the word definition, and use the test db for everything else. How can that be set up?
It's an hard-if-not-impossible task to achieve, for one reason: you should not do that.
Tests must be independent from your development environment. If you run the tests via a continuous integration tool (as it is recommended), the test environment is every time build from 0, with no knowledge of previous states.
If you need some records to be present for your test to run, then seed your test with the necessary records. That's the way to do it.

Some Rails unit testing questions (using Shoulda + Factory girl)

I have a couple of complicated objects to stub out (instances of gems I use). Where can I centralize these stubs to make them available to all tests?
How can I programatically clear the DB between tests without rake:test? I want to quickly run individual tests through textmate, but doing so will error out since it doesn't clear the DB between tests
The tests run slow since it has to spin up a Rails instance. How to speed up the tests? Especially while writing the tests and wanting to quickly run changes
1) You can either put them in test_helper.rb to make them available to all tests or you could write your own module which contains those methods and then include that module in the tests that require those stubs.
2) You could put Model.destroy_all (or .delete_all if appropriate which would be quicker) in your test setup method to strip out those models that you are no longer interested in.
However, if you are running tests in transactions (and your database supports transactions) then you shouldn't need to clear out any data because the creation of the data and the test will run in a transaction which will then be rolled back clearing the data automatically.
3) Not so sure on this one. I had this problem a lot developing on Windows but not so much on *nix. You could look into some kind of continuous testing but there's still going to be a delay on feedback. It might be worth investigating what is causing the rails environment to be so slow starting - it might be something you can skip in your testing environment.

Fixtures and Selenium and Rails (oh my?)

What data do you use with Selenium tests on Rails apps? Do you load from fixtures? Use an existing dev db? Use a separate (non-fixture) db?
I'm considering my options here. I have a Rails app with a large Selenium test suite that runs on a modified version of Selenium Grid. Part of the process, right now, is loading a large set of fixtures, once, before the test suite runs. It's a LOT of data. Most of it is reporting info exported from our production db. When I set it up originally, I exported the data to yaml from Oracle.
Now there's been a schema change in some of the reporting tables, so of course I have to regenerate the fixture data. There is so much of it that it's not worthwhile to edit the files by hand. But it seems inefficient to have to regenerate for every little schema change - not to mention that it's yet another step to remember to do. Is there a better way?
EDIT: I originally intended to load the fixtures before each test and unload them after each test, like regular Rails tests. But it takes about 15 minutes to load the fixtures due to this reporting data. There are 200+ tests, and the suite runs every 12 hours. I cannae bend spacetime captain!
EDIT 2: I also agree that having this big set of fixtures is a bad smell. I'm not sure how to pare it down, though, because the reports aggregate a lot of data and much of the value of the selenium tests is that they test the reports.
Even if it's a small set of data, though...it's still another set to keep co-ordinated with schema changes. (We have a separate, smaller set for unit, functional, and [Rails] integration tests.)
Which brings me back to my original question - are there other options besides doing it by hand, or remembering to regenerate them each time?
If you can, the best possible thing to do is have a system in which each Selenium test gets it's own data state (ie: DB tables dropped and recreated, bootstrap data re-inserted, and caches cleared). This is easier said than done and usually is only possible if the project planned for it from the start.
The next best thing is to have a consistent DB state for each test suite/run. This is not as nice since there is now a strong chance that some tests will depend on the success of previously run tests, making it more difficult identify true failures vs. false negatives.
The worst case, IMO, is to use a static DB in which each test run mutates the date. This almost always leads to problems and is usually a "project smell". The key to doing it the "right way" (again, IMO) is to be vigilant about any state/schema change and capture it as part of the automated test/build process.
Rails does a good job with this already with Migrations, so take advantage of them! Without knowing your situation, I'd generally question the need to run Selenium tests against a snap of the full DB. Most DBs can (or should) be distilled down to less than 1MB for automated testing, making automated schema migrations and data reset much more efficient.
The only time I've seen a "valid" reason for massive DBs for Selenium tests is when the DB itself contains large chunks of "logic data" in which the data affects the application flow (think: data-driven UI).
I think you're asking two questions here that are intertwined so if I'm to break it down:
You want to get test data into and out of your DB quickly and fixtures aren't doing it for you.
You've been burnt by a schema change and you want to make sure that whatever you do doesn't require eight iterations themed "fiddling with the test data...still" :)
You've got a couple of alternatives here which I've hashed out below. Because you've mentioned Oracle I'm using Oracle technologies here but the same thing is true for other DB platforms (e.g. Postgresql):
Rake tesks that call PL/SQL scripts to generate the data, nasty horrible evil idea, don't do it unless there's no other option. I did it on one project that needed to load in billions of rows for some infrastructure architecture tests. I still sulk about it.
Get your DB into a dump format. For speedy binary dumps check out the exp/imp and data pump utilities. This will allow you quick setup and teardown of your DB. Certainly on a rails project I worked on we used rake tasks to exp/imp a database which had around 300k records in under a minute. Also check SQLLoader which is the logical dump utility, as its logical its slower and requires you to have control scripts to help SQLLoader understand the dumps. However, the benefit of the logical dump is that you can run transformation scripts over them to massage the data into the latest format. Sadly though just like fixtures all these tools are pretty sensitive to change in the schema.
Use a plugin such as Machinist or Factory Girl to make the generation of the data nicer. You still incur the penalty of using ActiveRecord to setup the DB but these fake object generators will help you stay close to you migrations and are a lot less hassle to maintain than fixtures.
Combine approaches 2 and 3. What happens here is that you make some test data with say Machinst. You export that test data to a dump and then reload the dump during each test run. When the schema changes update the Machinist config and re-export.
Hope that helps.
I'm currently on a project with an enormous Selenium test suite--actually, the one Selenium Grid was written for--and our tests use a small amount of reference data (though we don't use Rails YAML fixtures) and object factories for one-off data needed for particular tests.
Alternatively, on many of the ThoughtWorks Rails projects I've been on we've written checkin scripts that incorporate a number of pre-commit hooks--for example, running the tests before allowing a commit. One thing you might consider trying is writing (or customizing) a similar checkin script that will check for schema changes and reload the reference data as needed.
See e.g. Paul Gross's rake commit tasks on Github.

Resources