Database Testing in Rails

Database Testing in Rails - ruby-on-rails

I'm using Rails 4 and the testing framework with which it ships.
I'm using a relational database that needs to be rigorously tested for internal consistency. In this case, it's a sports statistics database with player stats that are updated nightly.
I need to run tests like the following when new data arrives each night:
--That the sum of player stats equals the team's.
--That the sum of team stats equals the league's.
--That the sum of wins and losses equals games played.
--etc.
For now I'm copying my development database over to testing and running these alongside my unit tests in the /test/models/ directory.
This is an awkward set-up as my database testing code isn't comprised of unit tests in their proper sense, and it doesn't rely on fixtures as the Rails documentation suggests this folder be used for.
My question is: In Rails, what is the best practice for database testing like that which I describe above?

This isn't really testing in the classical sense, so it doesn't really make sense to include this with the unit tests that you have. There are at least 2 good solutions:
Compute things like team and league stats on the fly. It doesn't sound like that's what you have going, though.
When you are adding new data to the database, check for consistency then. If one record/value breaks the internal consistency, don't add it to the database.

Related

rails integration testing why fixtures rather than db

I'm new to testing in rails and I don't understand why or when I should use fixtures (or factory) rather than just seeding my test db and querying it to run the tests?
In many cases, it should be faster and easier to have the same data in dev and test env.
For example, if I want to test an index page, should I create 100 records via a factory or should I seed the db with 100 records?
I someone could clarify this, it would be great.
Thanks!

This is actually a deeper question of how to test efficiently, and you will find a lot of different opinions.
The reason to avoid a database in your unit tests is merely speed. Database operations are slow. It might not seem slow with one test, but if you have continuous integration going (as you should) or when you made a quick change and just want to see what happens, those delays add up. So prefer mocks to truly unit test code.
Then your own integration tests should hit an in-memory database rather than your real database--for the same reason, speed. These will be slower than your mocked tests, but still faster than hitting the real database. When you're developing, the build-test-deploy cycle needs to be as fast as possible. Note that some people call these unit tests as well. I wouldn't, but I guess it is just semantics.
These first two kinds of tests are by developers for developers.
Then the testers will hit the real database, which will be populated with test data defined by the testers and subject-matter experts. There are lots of clever ways to speed this up as well, but this will be the place where they test the integration of your code with the production-like database. If all your in-memory database tests passed and something goes wrong here, then you know it has to do with something like database configuration, vendor-specific SQL, etc. rather than something fundamentally bad. You will also get your first taste of what the performance is like.
Note that everything I've said here is a matter of debate. But hopefully it clarifies what you should consider about when to do certain things and why.

Input and test data for a SpecFlow scenario

I have started recently using SpecFlow and I have 2 basic questions I need to clarify, also to confirm I am on the right way:
As I understand, it is a must that all the input data (test parameters for the scenarios) to be provided by the tester, the same about the test data (input data for the tables involved in the test scenarios)
Are there any existing tools for a quick way of generating test data (inserting it into the DB) ? I am using Entity Framework as part of the Data access layer. I was wondering about some tool that would read the data from a file or probably some Desktop application to provide values for the table's fields (which could also then generate a file from which some other tool could read all the data and generate all the required objects etc).
I also had a look at Preparing data for a SpecFlow scenario - I was thinking if there is already a framework which would achieve insert\delete of test data to use alongside with SpecFlow.

I don't think you are on the right track. SpecFlow is a BDD tool, but in some ways it only covers part of the process. Have a read of http://lizkeogh.com/2013/07/01/behavior-driven-development-shallow-and-deep/ and see if any if the scenarios sound familiar?
To move forwards I would recommend you start with http://dannorth.net/introducing-bdd/ to get a good idea of how it all began. Now lets consider your points;
The tester provides all the test data. Well yes and no. The idea is that between yourself and the feature expert, you are able to have a conversation that provides all the examples that you need to develop your feature. If you don't involve yourself in that conversation, then yes all the data will come from the other side, but the chances are it won't be such high quality as if you are able to ask the right questions and guide the conversation so the data follows a structure that you can code tests too.
As an example here, when I first started with BDD I thought I could get the business experts to write the plain text scenario files with less input from the development, but in practice the documents tended to be less useful than when we were involved. Not because they couldn't write decent specifications, but actually because they couldn't refactor them to reuse bindings etc. We were still needed to add our skills into the process.
Why does data go into a database? A good test is isolated to the scope that it is testing. For a UI layer test this means that we don't have a database. For a business tier test we shouldn't be reliant on the database to get data either.
In practice a database is one of the most difficult things to include in your testing because once any part of the data changes you cause cascading test failures.
Instead I would recommend making your features smaller and provide the data for your test in the scenario or binding. This also makes having your conversation easier, because the fiftieth row of test pack is not something either party is going to remember. ;-) I recommend instead trying to give you data identities, so "bob" might be individual in a test you can discuss, and both sides understand what makes him an interesting example.
good luck :-)
Update: With regard to using a database during testing, my experience is that there are a lot of complexities that make it a difficult choice to work with. Consider these points,
How will you reset the state of your data between tests?
How will you reset the state if one / some tests fail?
If you are using branches or even just if two developers are making changes at the same time, how will you support multiple test datasets?
How will you handle two instances of the tests running at the same time (don't forget the build server)?
Have a look at this question SpecFlow Integration Testing with Database Patterns which includes some patterns that you can use.

Data storage solution for a Build Server

I'm drafting a proof-of-concept build server and what got me thinking is how to store all the data it produces. For example, I'd like to store
Unit test results: which tests were run, how much time each test took, results, stacktraces, number of assertions
Code coverage information, with line-level granularity
Various LoC metrics - per file, per file type
Code duplicates information
Additionally, these are the kinds of queries I'd like to run:
How has tests' execution time changed over time?
How has overall code coverage percentage changed over time? And what about this particular method? How has uncovered line count changed over time?
What was the dynamic of LoC for *.cs files? How total LoC count was changing?
Stuffing all this into a RDBMS doesn't sound like a particularly good idea. What storage technology fits my bill best here?

If you don't want to use an RDBMS you could definitely go with MongoDB for your requirements.
It allows you to group similar documents in a collection and each document in a collection does not have to have the same schema. One document can have 5 fields, another can have 10.
It can be fairly easily scaled to provide redundancy.
MongoDB also provides what they call the "aggregation framework" that allows you to generate stats/aggregations over your data. It's faster than their map/reduce solution - which can be a little slow of course.
Of all the document databases out there right now, I would say it is clearly the most mature and definitely has the richest query language.

Testing Rails application with plenty of seed data

I'm maintaining a Rails 3.1 app. App db has more than 50+ tables and maybe 30 of those need seed data for the app to function correctly.
App has plenty of statistical data (as a seed data) and some tables contain more than 150 000 records. I have been testing using fixtures (actually using rake tasks to create fixture files from dev-db). Because of huge fixture files, testing has become slower and slower. We're talking about 20+ minutes to run the whole test suite.
At the time I started making tests, fixtures were way to go. Currently I'm not so sure anymore. I keep reading about tools like factory_girl, capybara, rspec and spork. I've done few tests with those and they seem nice and fun to use.
Basically I'd like to know how would you test this kind of setup?
Fixtures are way too slow. Thanks for the help!

Well, with application as huge as yours, the test suite should be running very long also. I think the greatest improvement here would be using less testing data in the database. You can test associations or whatever it is you're doing db-related, but when you're testing model functionality for example, set up mock expectations on #save method and verify that your code changed #attributes of the model. I think that testing everything against the database is redundant. You don't have to include rails stack as your testing target (which you do when you save to the database), as it's very thoroughly tested already.

What's the best practice for handling mostly static data I want to use in all of my environments with Rails?

Let's say for example I'm managing a Rails application that has static content that's relevant in all of my environments but I still want to be able to modify if needed. Examples: states, questions for a quiz, wine varietals, etc. There's relations between your user content and these static data and I want to be able to modify it live if need be, so it has to be stored in the database.
I've always managed that with migrations, in order to keep my team and all of my environments in sync.
I've had people tell me dogmatically that migrations should only be for structural changes to the database. I see the point.
My counterargument is that this mostly "static" data is essential for the app to function and if I don't keep it up to date automatically (everyone's already trained to run migrations), someone's going to have failures and search around for what the problem is, before they figure out that a new mandatory field has been added to a table and that they need to import something. So I just do it in the migration. This also makes deployments much simpler and safer.
The way I've concretely been doing it is to keep my test fixture files up to date with the good data (which has the side effect of letting me write more realistic tests) and re-importing it whenever necessary. I do it with connection.execute "some SQL" rather than with the models, because I've found that Model.reset_column_information + a bunch of Model.create sometimes worked if everyone immediately updated, but would eventually explode in my face when I pushed to prod let's say a few weeks later, because I'd have newer validations on the model that would conflict with the 2 week old migration.
Anyway, I think this YAML + SQL process works explodes a little less, but I also find it pretty kludgey. I was wondering how people manage that kind of data. Is there other tricks available right in Rails? Are there gems to help manage static data?

In an app I work with, we use a concept we call "DictionaryTerms" that work as look up values. Every term has a category that it belongs to. In our case, it's demographic terms (hence the data in the screenshot), and include terms having to do with gender, race, and location (e.g. State), among others.
You can then use the typical CRUD actions to add/remove/edit dictionary terms. If you need to migrate terms between environments, you could write a rake task to export/import the data from one database to another via a CSV file.
If you don't want to have to import/export, then you might want to host that data separate from the app itself, accessible via something like a JSON request, and have your app pull the terms from that request. That seems like a lot of extra work if your case is a simple one.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart