Aqueduct & in memory database - dart

just wanted to know, if the Aqueduct ORM supports a simple in memory database, for testing purposes. Looking for something easy and lightweight to write the backend, before actually connecting it to postgres.

I've used similar approach with H2 and Postgres with Java, but it is rather error-prone: While the SQL interface may be similar, you could be using a feature that is available in one, but not the other. Eventually, either your development is blocked, or it is OK, but then the real deployment will be hitting issues.
I've found that starting a Postgresql instance in a docker is much easier than I've first thought, and now I use the same principle for most external dependencies: run them inside docker. If there is an interest, I can open source a Dart package that starts the docker container and waits until a certain string pattern is present on the output (e.g. a report on successful start).

Aqueduct was built to be tested with a locally running instance of PostgreSQL. This avoids the class of errors that occur when using a different database engine in tests vs. deployment. It is a very important feature of Aqueduct.
The tl;dr is that you can use a local instance of PostgreSQL with the same efficiency as an in-memory database and there is documentation on the one-time setup process.
The Details
Aqueduct creates an intermediate representation of your data model at startup by reflecting on your application code. This representation drives database migrations, serialization, runtime reflection, and can even be exported as JSON to create data modeling tools on top of Aqueduct.
At the beginning of each test suite, your test harness uses this representation to generate temporary tables in a local database named dart_test. Temporary tables are destroyed as soon as the database connection is lost; which you can configure to happen between tests, groups of tests, or entire test suites depending on your needs. It turns out that this is very fast - on the order of milliseconds.
CI platforms like TravisCI and Appveyor both support local PostgreSQL processes. See this script and this travis config for an example.

Related

Migrating EF Code First in multiple instances

I have an MVC app that using EF6 Code First. I want to deploy this app to multiple datacenters. On deployments that have migrations, I can write a script to migrate them all as simultaneously as possible, but if one datacenter is slower, then the calls could all be rejected since the schema no longer matches. A script that tried to coordinate would also make rolling upgrades impossible.
Is there a way to make EF at least attempt to run the query even though the schemas don't match? Is there a different way I can/should approach this?
UPDATE:
Let's see if I can word this better. I want to have my MVC app in multiple datacenters. Let's assume that I deploy the app to each datacenter individually.
Option 1
Deploy to DC A
Code first migration runs on centralized DB
Requests made to DC A succeed, but requests to DC B fail
Option 2
Deploy to DC A
Do not automatically run migration
Requests made to DC A fail and requests to DC B continue to succeed
How do I develop a deployment strategy that will make it so that requests to either DC will work?
BTW: I am using Azure Web Sites, if a platform-specific solution is needed.
In your post, it seemed like you were concerned with how it would behave during the actual upgrade. Nothing about testing. But in comments you are asking about doing a partial deployment then doing testing. So on one hand you'd want to deploy as quickly as possible to minimize downtime. On the other hand, it sounds like you want to deploy to one site, test, and have the other sites continue to function while you are verifying the first deployment?
Verifying a deployment is reasonable, but fairly complex. I'm not sure you will find much in the way of automation for this. I think you should test prior to production deployment thoroughly, and then simply deploy as quickly as possible in production. If there were an issue you found only when deploying to production, you'd be in a bad situation, because now your site is down until you can fix it. Even if you could get the other instance to work with the new database, that is risky as it is going to be modifying things against a schema it doesn't completely understand. Additionally, if you do need to rollback the DDL then you will almost certainly lose any data that was modified since the deployment. So it is really best that all instances for the old schema fail until they are upgraded, to prevent them from modifying data that is at risk of being lost.
Usually you should have done a deployment to a staging environment that is as close to your production as possible to test the database migration process. This is called pre-production testing, and sometimes involves restoring the most recent backup from production into staging to ensure new constraints/structures are valid for existing data. By deploying to this staging environment, you should have a very high level of confidence that production deployment will go successfully.
You additionally safe guard yourself against production deployment issues by taking backups prior to deployment so that you can rollback as necesary(although this is worst case scenario as it might mean throwing out important data that came in between backup/deployment and realization that there is an issue). I imagine EF migrations uses a transaction to run the DDL scripts so they should rollback all-or-nothing if there is an issue.

Rollback informix database

I have some code that uses an Informix 11.5 database that I want to run some tests against.
If the tests fail they will often leave the database in an inconsistent state that needs to be manually resolved before the tests can be run again.
I would like to automate this so that the tests do not require manual intervention before running the tests again.
My current solution is to write some code that does the cleanup, but this means the code must be maintained whenever potential new inconsistent states can occur in new features.
The code runs a lot of stored procedures, which themselves often use transactions. As Informix does not support nested transactions I can't just wrap up all the work in one big transaction.
Is there another way to create a checkpoint which I can restore the database back to?
You could create a virtual machine with an undo disk and after you run the test you can close the virtual machine without saving the changes. It's equivalent to like you never ran the tests!
If this is a development only server, how about taking a Level 0 ontape system archive before the test? I think this can be done via the sysadmin functions too (not sure though), so it can be automated. After the tests you just restore the archive.
Changing database state - and resetting it back to a known state - is one of the reasons that the Unit Test community spends time and effort avoiding testing against databases. It is a tough problem.
Informix 11.50 does support savepoints; however, it does not support one BEGIN WORK after another without an intervening COMMIT or ROLLBACK.
To the extent possible, have the tests create and load a set of tables with the known data. One way of achieving that is to create a whole new database for the test. However, this is only borderline feasible if you need to test with high volumes of data.
I don't think this issue is in any way unique to Informix - it is a general problem with testing DBMS operations.

How to prepare for data loss in a production website?

I am building an app that is fast moving into production and I am concerned about the possibility that due to hacking, some silly personal error (like running rake db:schema:load or rake db:rollback) or other circumstance we may suffer data loss in one database table or even across the system.
While I don't find it likely that the above will happen, I would be remiss in not being prepared in case it ever does.
I am using Heroku's PG Backups (which is to be replaced with something else this month), and I also run automated daily backups to S3: http://trevorturk.com/2010/04/14/automated-heroku-backups/, successfully generating .dump files.
What is the correct way to deal with data loss on a production app?
How would I restore the .dump file in case I need to? Can I do a selective restore if a small part of the system is hit?
In case a selective restore is not possible: assume one table loses data 4 hours after the last backup. Result => would fixing the lost table require rolling back 4 hours of users' activity? Any good solution to this?
What is the best way to support users through the inconvenience if something like this happens?
A full DR (disaster recovery) solution requires the following:
Multisite. If a fire, flood, Osama Bin Laden or whathaveyou strikes the Amazon (or is it Salesforce?) data center that Heroku uses, you want to be sure that your data is safe elsewhere.
On-going replication of the data to a separate site (or sites). That means that every transaction that's written to your database on one site, is replicated within seconds to the mirror database on the other site. Most RDBMS's have mechanisms to let you do a master-slave replication like that.
The same goes for anything you put on a filesystem outside of the database, such as images, XML configuration files etc. S3 is a good solution here - they replicate everything to multiple data centers for you.
I won't hurt to create periodic (daily or so) dumps of the database and store them separately (e.g. on S3). This helps you recover from data corruption that propagates to the slave DBs.
Automate the process of data recovery. You want this to just work when you need it.
Test everything. Ideally, you want to automate the test process and run it periodically to ensure that your backups can restore. Netflix Chaos Monkey is an extreme example of this.
I'm not sure how you'd implement all this on Heroku. A complete solution is still priced out of reach for most companies - we're running this across our own data centers (one in the US, one in EU) and it costs many millions. Work according to the 80-20 rule - on-going backup to a separate site, plus a well tested recovery plan (continuously test your ability to recover from backups) covers 80% of what you need.
As for supporting users, the best solution is simply to communicate timely and truthfully when trouble happens and make sure you don't lose any data. If your users are paying for your service (i.e. you're not ad-supported), then you should probably have an SLA in place.
About backups, you cannot be sure at 100 percent every time that no data will be lost. The best is to test it on another server. You must have at leat two types of backup :
A database backup, like pg-dump. A dump is uniquely SQL commands so you can use it to recreate the whole database, just a table, or just a few rows. You loose the data added in the meantime.
A code backup, for example a git repository.
in addition to Hartator's answer:
use replication if your DB offers it, e.g. at least master/slave replication with one slave
do database backups on a slave DB server and store them externally (e.g. scp or rsync them out of your server)
use a good version control system for your source code, e.g. Git
use a solid deploy mechanism, such as Capistrano and write your custom tasks, so nobody needs to do DB migrations by hand
have somebody you trust check your firewall setup and the security of your system in general
The DB-Dumps contain SQL-commands to recreate all tables and all data... if you were to restore only one table, you could extract that portion from a copy of the dump file and (very carefully) edit it and then restore with the modified dump file (for one table).
Always restore first to an independent machine and check if the data looks right. e.g. you could use one Slave server, take if offline, then restore there locally and check the data. Good if you have two slaves in your system, then the remaining system has still one master and one slave while you restore to the second slave.
To simulate a fairly simple "total disaster recovery" on Heroku, create another Heroku project and replicate your production application completely (except use a different custom domain name).
You can add multiple remote git targets to a single git repository so you can use your current production code base. You can push your database backups to the replicated project, and then you should be good to go.
The only step missing from this exercise verses a real disaster recovery is assigning your production domain to the replicated Heroku project.
If you can afford to run two copies of your application in parallel, you could automate this exercise and have it replicate itself on a regular basis (e.g. hourly, daily) based on your data loss tolerance.

how to "test" an externally managed db in rails

Background: We have a dependency on an externally managed database. This is a company-wide resource. We have a read-only account into it and have no control over or input into the schema or contents.
Issue: We're using ActiveRecord as our ORM into said resource; we manage the connection information separate from our central db connection information. It worked out fine. We have some characterization tests that verify that our ActiveRecords retrieve the data for a few know datapoints. However, we have no test/dev environment replacement strategy for this database. Right now all of our environments are configured to use the production database connection:
That sucks
We don't want the production password on the build server, so our build is broken
The queries to the production database server are slow and because caching is off in test/dev our homepage loads REALLY slow locally
So we need something else in test/dev mode.
Q) Why not just have another sqlite database locally that mimics the schema of the production database?
A) Because we've tried that for another connection and it's lousy for at least a couple reasons.
It's fairly complex managing the separate schema (sqlite db file) in the rake process just for testing/dev.
Testing ActiveRecords outside of a schema that's managed by some process that ensures schema consistency between environments is largely meaningless.
The database configuration doesn't feel like the right seam. The database connection aspect of this, and thus the AR, is not part of what we're developing, it's just a connection library in this case. As long as we can ensure our test/dev replacement for it acts the same as the production AR, then it doesn't matter if we use AR for this in test/dev. I hope that made sense, it's an important point.
Q) You could use SchemaDumper to grab the schema of the production database and use it to generate the test database. That way all the SQLy details would be automated and it would look more like typical rails stuff.
A) Yeah, that would be pretty hot, but SchemaDumper doesn't seem to play nicely with the production database connection. It just hangs after a while and we don't get the whole schema. Bummer. That also doesn't avoid having to manage that whole other database file and work managing said file into our rake tasks.
What I really want to do is to have production use the ARs that are tested in the characterization tests and then have another object that's a plain old PORO that reads stuff out of a yaml file (like a fixture) that replaces the object in the test/development/build environments.
Q) But Najati, isn't putting that stuff in a yaml file the same as defining the schema?
A) Well, yeah, sorta. Its just a lot more direct and easier to manage if it's in some PORO that loads some crap out of a yamlfile than if I also have to work some half-baked schema management into our build tasks; we do this currently and it's pretty lousy and, frankly, doesn't seem to be buying us much. Also the test schema and the test data fixture duplication the information: "this is what we want the test version of this data to look like" - why do we need both? I claim "So that you can use the same AR in both environments." is not a sufficient argument to justify the complexity of managing the extra sqlite db file.
Q) I feel like there's something you're not telling me.
A) I've been cheating on my Weight Watchers. Also,
In the past when I've had something like this my solution looked like this:
Characterization tests that capture the important aspects of the external service's behavior, run not with the unit test suite, but as a separate process on the build server, maybe once every 4 hours or every night or whatever.
A fake implementation that used the same set of tests to exercise it's behavior to ensure that it was providing similar functionality to the test/dev environment.
Spring (and probably dependency injection containers in general) makes this easy. You just swap out beans in your environment-specific bean config and the test env just goes on it's merry way.
Given my understanding/knowledge, Rails doesn't seem to be lending itself to this very well. I supposed I could redefine the class in my test/dev environment scripts, but that seems really shady. For one thing, I don't know if that would keep the model from being loaded at application start-up, and another, that would add yet another strange wrinkle to our Rails project, another bit of magic that would make the project harder to come up to speed on. I want something that feels like the "service replacement" strategy used in Spring that doesn't require hard-to-find/understand RoR magic.
Uhh. I'll stop there and see if that much prompts anything. Thanks for taking the time to read!
You don't actually test the database. You're testing your models that interact with the app or other 'original' code that might touch the database. If there is magic in the prod database, take it out and put it in fixtures or factories. The fixtures and factories load the test data into a test instance, for example: db_test. When the test has passed or failed the database is rolled back with transactions and your tests can (and should) run atomically. If you are trying to build an app that tests a database, that's a different story. For everyone else, use the testing design that Rails provides: fixtures or factories and a "test" rails database defined in config/database.yml. The YML file swaps out for the dependency injection functionality. It's just a hash of variables, you don't need any pojo spring tricks to swap out environments. :) When rails runs your tests with fixtures or factories, it will load only the test environment as defined in database.yml. This will also integrate nicely with rspec, guard and other tools. When I save one of my models, it creates some data in my test db, runs my test and cleans up the database all just by hitting the save button on my source file.
Integration tests should still use this same mechanism. The only thing that makes this process annoying is legacy databases and I've worked some magic there with metaprogramming to minimize the hassle.
Take a look at factory_girl for factories. And episode railscast #275: http://railscasts.com/episodes/275-how-i-test
I think you have only two options: either you duplicate the database in some way, or you separate the ORM into a thin (as thin as possible) layer and mock it out in your tests.
Besides, you may have an AR schema ready in your db/schema.rb.

Fixtures and Selenium and Rails (oh my?)

What data do you use with Selenium tests on Rails apps? Do you load from fixtures? Use an existing dev db? Use a separate (non-fixture) db?
I'm considering my options here. I have a Rails app with a large Selenium test suite that runs on a modified version of Selenium Grid. Part of the process, right now, is loading a large set of fixtures, once, before the test suite runs. It's a LOT of data. Most of it is reporting info exported from our production db. When I set it up originally, I exported the data to yaml from Oracle.
Now there's been a schema change in some of the reporting tables, so of course I have to regenerate the fixture data. There is so much of it that it's not worthwhile to edit the files by hand. But it seems inefficient to have to regenerate for every little schema change - not to mention that it's yet another step to remember to do. Is there a better way?
EDIT: I originally intended to load the fixtures before each test and unload them after each test, like regular Rails tests. But it takes about 15 minutes to load the fixtures due to this reporting data. There are 200+ tests, and the suite runs every 12 hours. I cannae bend spacetime captain!
EDIT 2: I also agree that having this big set of fixtures is a bad smell. I'm not sure how to pare it down, though, because the reports aggregate a lot of data and much of the value of the selenium tests is that they test the reports.
Even if it's a small set of data, though...it's still another set to keep co-ordinated with schema changes. (We have a separate, smaller set for unit, functional, and [Rails] integration tests.)
Which brings me back to my original question - are there other options besides doing it by hand, or remembering to regenerate them each time?
If you can, the best possible thing to do is have a system in which each Selenium test gets it's own data state (ie: DB tables dropped and recreated, bootstrap data re-inserted, and caches cleared). This is easier said than done and usually is only possible if the project planned for it from the start.
The next best thing is to have a consistent DB state for each test suite/run. This is not as nice since there is now a strong chance that some tests will depend on the success of previously run tests, making it more difficult identify true failures vs. false negatives.
The worst case, IMO, is to use a static DB in which each test run mutates the date. This almost always leads to problems and is usually a "project smell". The key to doing it the "right way" (again, IMO) is to be vigilant about any state/schema change and capture it as part of the automated test/build process.
Rails does a good job with this already with Migrations, so take advantage of them! Without knowing your situation, I'd generally question the need to run Selenium tests against a snap of the full DB. Most DBs can (or should) be distilled down to less than 1MB for automated testing, making automated schema migrations and data reset much more efficient.
The only time I've seen a "valid" reason for massive DBs for Selenium tests is when the DB itself contains large chunks of "logic data" in which the data affects the application flow (think: data-driven UI).
I think you're asking two questions here that are intertwined so if I'm to break it down:
You want to get test data into and out of your DB quickly and fixtures aren't doing it for you.
You've been burnt by a schema change and you want to make sure that whatever you do doesn't require eight iterations themed "fiddling with the test data...still" :)
You've got a couple of alternatives here which I've hashed out below. Because you've mentioned Oracle I'm using Oracle technologies here but the same thing is true for other DB platforms (e.g. Postgresql):
Rake tesks that call PL/SQL scripts to generate the data, nasty horrible evil idea, don't do it unless there's no other option. I did it on one project that needed to load in billions of rows for some infrastructure architecture tests. I still sulk about it.
Get your DB into a dump format. For speedy binary dumps check out the exp/imp and data pump utilities. This will allow you quick setup and teardown of your DB. Certainly on a rails project I worked on we used rake tasks to exp/imp a database which had around 300k records in under a minute. Also check SQLLoader which is the logical dump utility, as its logical its slower and requires you to have control scripts to help SQLLoader understand the dumps. However, the benefit of the logical dump is that you can run transformation scripts over them to massage the data into the latest format. Sadly though just like fixtures all these tools are pretty sensitive to change in the schema.
Use a plugin such as Machinist or Factory Girl to make the generation of the data nicer. You still incur the penalty of using ActiveRecord to setup the DB but these fake object generators will help you stay close to you migrations and are a lot less hassle to maintain than fixtures.
Combine approaches 2 and 3. What happens here is that you make some test data with say Machinst. You export that test data to a dump and then reload the dump during each test run. When the schema changes update the Machinist config and re-export.
Hope that helps.
I'm currently on a project with an enormous Selenium test suite--actually, the one Selenium Grid was written for--and our tests use a small amount of reference data (though we don't use Rails YAML fixtures) and object factories for one-off data needed for particular tests.
Alternatively, on many of the ThoughtWorks Rails projects I've been on we've written checkin scripts that incorporate a number of pre-commit hooks--for example, running the tests before allowing a commit. One thing you might consider trying is writing (or customizing) a similar checkin script that will check for schema changes and reload the reference data as needed.
See e.g. Paul Gross's rake commit tasks on Github.

Resources