Rerunning spec x times if if fails - ruby-on-rails

We have some feature specs which fails randomly. We don't have too much time to fix them and we don't really know for now how to do this. Because of that we must rerun builds on cicrcle ci until they are green. Is it possible to run some spec, and if it fails rerun this few times, until it's green?

Try to have a look at following gems:
https://github.com/dblock/rspec-rerun
https://github.com/y310/rspec-retry
(taken from discussion in https://github.com/rspec/rspec-core/issues/456)
Personally I think having flickering tests is worse then having no tests in the first place because the are adding hassle and they destroy the trust in tests in general, which you need for swift refactoring.
Best would be
delete them since they don't provide the value they should
take your time to rewrite them
For getting the time to do so try to convince management that the investment in time on fixing these issues saves a lot of developer time in the long run (best with quick example calculation: x fails a day, result in yyy extra minutes with devs waiting for the built to be green) ;)

Related

Merging results from different Cucumber HTML Reports

When running our test suite we perform a re-run which gives us 2 HTML reports at the end. What I am looking to do is have one final report so that I can then share it with stakeholders etc.
Can I merge the 2 reports so that in the first run a test had failed but in the second run it had passed, the report shows the test has passed?
I basically want to merge the reports to show a final outcome of the test run. Thanks
By only showing the report that passed you'd be throwing away a valuable piece of information: that there is an issue with the test suite making it flaky during execution. It can be something to do with the architecture or design of a particular test, or maybe the wait/sleep periods for some elements. Or, in some cases, the application we're testing has some sort of issue that a lot of times goes unchecked.
You should treat a failing report with as much respect as a passing one. I'd share with the stakeholders both reports and a short analysis of why the tests are failing in the first one(s), or why do they usually fail, and a proposal/strategy to fix the failure.
Regarding the merging of the reports, it can be done. You can, via a script that takes both reports, maybe extract the body of each, and, element by element only do a copy the passing one if the other is failing, or if both are failing, copy a failing one. But it looks like that would be an effort to hide a possible problem, and not to fix it from the ground up.
Edit:
There is at least one lib that can help you achieve this,
ReportBuilder, or the Java equivalent:
ReportBuilderJava.

What is a systematic approach to debug intermittently failing specs?

I have four tests in my Capybara/Rspec suite that keep failing (a real problem for CI deployment).
The worst thing, these tests are failing intermittently, and often only when the entire suite is run, making it difficult to debug.
They are all ajax requests, either submitting a remote form or clicking a remote link, followed by expect(page).to have_content 'My Flash Message'.
These tests even fail intermittently within the same test cycle. For example, I have several models that behave similarly, so I am iterating through them to test.
e.g.,
['Country', 'State', 'City'].each do |object|
let(:target) { create object.to_sym }
it 'runs my frustrating test' do
end
end
Sometimes country fails, sometimes state, sometimes everything passes.
I have tried adding wait: 30 to the expect statement. I have tried adding sleep 30 before the expect statement. I'm still getting intermittent passes.
There is quite a bit of information out there describing finicky ajax tests, but I have not found much about how to debug and fix these kinds of problems.
I'm really grateful for any advise or pointers from others, before I pull all my hair out!!
UPDATE
Thank you for all these excellent responses. It's been useful to see that others have grappled with similar issues, and that I'm not alone.
So, is there a solution?
The suggestions to use debugging tools such pry, byebug, Poltergeist's debug feature (thanks #Jay-Ar Polidario, #TomWalpole) have been useful to confirm what I thought I already knew — namely, and as suggested by #BM5K) that the features work consistently in the browser, and the errors lie within the tests.
I experimented with with adjusting timeouts and retries (#Jay-Ar Polidario, #BM5K), and while an improvement these were still not a consistent fix. More importantly, this approach felt like patching holes rather than a proper fix, and so I was not entirely comfortable.
Ultimately I went with a major rewrite of these tests. This has entailed breaking up multi-step features, and setting up and testing each step individually. While purists may claim this is not truly testing from the user's perspective, there is sufficient overlap between each test that I'm comfortable with the result.
In going through this process, I did notice that all of these errors were related to "clicking on things, or filling forms", as #BoraMa suggested. Though in this case the experience was reversed — we had adopted .trigger('click') syntax because capybara + poltergeist was reporting errors clicking on elements using click_link or find(object).click, and it was these tests that were problematic.
To avoid these problems I've removed JS from the tests as much as possible. i.e., testing the majority of the feature without JS enabled, and then creating very short, targeted JS specs to test specific JS responses, features or user feedback.
So there is not really one single fix. A major refactoring that, in all honesty, probably needed to happen and was a valuable exercise. The tests have lost some features by by breaking everything up into individual tests, but as a whole this has made the tests easier to read and maintain.
There are still a couple of tests that are occasionally showing red, and will need some more work. But overall a great improvement.
Thank you all for the great guidance, and reassuring me that interactions in the testing environment could be the root cause.
Let me bring out story too :). Recently, we also tried to hunt and fix the issues with our intermittently failing tests under similar setup (Poltergeist, JS tests). The tests failed more probably when the whole test suite was run than individually but in about one third of time the whole suite succeeded. It was just a couple of tests from the suite, about 10, that randomly failed, others seemed to run OK all the time.
First we made sure the tests were not failing due to db truncation issues, leftover records etc. We made screenshots upon the moment of failure to verify that the page looked correct.
After a lot more searching we noticed that all of the remaining failing tests deal with clicking on things, or filling forms while there were jQuery animations and other dynamic operations frequently used on the pages. This lead us to this Poltergeist issue which helped us greatly in the end. It turns out that Poltergeist, when clicking on a button or dealing with form inputs, tries to maximally mimic the normal user which can lead to problems when the inputs / links are animated.
A way to recognize that this was indeed an issue for us was that we could successfully find the element on the page but the browser was unable to click on it.
We ended up using a not very clean solution - we have rewritten some capybara helpers for clicking and interacting with forms to use find and trigger internally:
# override capybara methods as they react badly with animations
# (click/action is not registered then and test fails)
# see https://github.com/teampoltergeist/poltergeist/issues/530
def click_button(locator, *options)
find_button(locator, *options).trigger(:click)
end
def click_link(locator, *options)
find_link(locator, *options).trigger(:click)
end
def choose(locator, *options)
find(:radio_button, locator, *options).trigger(:click)
end
def check(locator, *options)
find(:checkbox, locator, *options).trigger(:click)
end
This approach may lead to some unexpected problems because now you'll be able to click on things in your tests even if they are e.g. overlapped by a modal div or when they are not fully visible on the page. But after reading carefully the comments on the github issue, we decided that this was the way to go for us.
Since then, we have only very occasional test failures which seem to be related to another Poltergeist timeouts issue. But the failures are so rare that we don't feel the urge to look further - the tests are finally reliable enough.
Intermittently failing tests are a pain to troubleshoot, but there are some things you can do to make life easier. First would be to remove any looping or shared examples. Explicitly stating each expectation should make it more clear which example combination is failing (or make it even more obvious that it is indeed random).
Over the course of several runs, track which tests are failing. Are they all in the same context group?
Are you mixing and matching javascript tests and non-javascript tests? If you are, you may be running into database issues (I've seen problems caused by switching database cleaner strategies mid context block).
Make sure you consider any parent context blocks the tests are in.
And if none of that narrows down your search, use a gem that allows you to retry failing tests.
I used respec-retry in the past, but have found it to be unreliable lately. I've switched to rspec-repeat. I usually leave these off in development (configured for 1 try) and run with multiple tries on CI (usually 3). That way I can get a feel for which tests are wobbly locally, but not let those tests break my build (unless they fail consistently).
TL;DR
Most of the intermittently failing tests I encounter have a lot of moving pieces (rails, capybara, database cleaner, factory girl, phantomjs, rspec just to name a few). If the code is tested AND the specs frequently pass AND the feature consistently works in the browser chances are some interaction in your testing environment is the root cause of the intermittent failures. If you can't track that down, retry the failing specs a couple of times.
If you are sure that there is no changing variable in both server (Rails), and client (JS) side. You may try the following if it will work. We used this for some similar problem we had.
spec/support/wait_for_ajax.rb
# ref: https://robots.thoughtbot.com/automatically-wait-for-ajax-with-capybara
module WaitForAjax
def wait_for_ajax
Timeout.timeout(Capybara.default_max_wait_time) do
loop until finished_all_ajax_requests?
end
sleep(1) # ensure just because above doesn't always work
end
def finished_all_ajax_requests?
page.evaluate_script('jQuery.active').zero?
end
end
spec/features/YOUR_SPEC.rb
Rspec.feature 'My Feature Test', type: :feature do
['Country', 'State', 'City'].each do |object|
let(:target) { create object.to_sym }
it 'runs my frustrating test' do
find('#my-div').click
wait_for_ajax
end
end
end
rails_helper.rb
# ..
RSpec.configure do |config|
# ..
config.include WaitForAjax, type: :feature
# ..
end
# ..

How can I create a golden master for mvc 4 application

I was wondering how to create a golden master approach to start creating some tests for my MVC 4 application.
"Gold master testing refers to capturing the result of a process, and
then comparing future runs against the saved “gold master” (or known
good) version to discover unexpected changes." - #brynary
Its a large application with no tests and it will be good to start development with the golden master to ensure the changes we are making to increase the test coverage and hopefully decrease the complexity in the long don't break the application.
I am think about capturing a days worth of real world traffic from the IIS logs and use that to create the golden master however I am not sure the easiest or best way to go about it. There is nothing out of the ordinary on the app lots controllers with post backs etc
I am looking for a way to create a suitable golden master for a MVC 4 application hosted in IIS 7.5.
NOTES
To clarify something in regards to the comments the "golden master" is a test you can run to verify output of the application. It is like journalling your application and being able to run that journal every time you make a change to ensure you have broken anything.
When working with legacy code, it is almost impossible to understand
it and to write code that will surely exercise all the logical paths
through the code. For that kind of testing, we would need to
understand the code, but we do not yet. So we need to take another
approach.
Instead of trying to figure out what to test, we can test everything,
a lot of times, so that we end up with a huge amount of output, about
which we can almost certainly assume that it was produced by
exercising all parts of our legacy code. It is recommended to run the
code at least 10,000 (ten thousand) times. We will write a test to run
it twice as much and save the output.
Patkos Csaba - http://code.tutsplus.com/tutorials/refactoring-legacy-code-part-1-the-golden-master--cms-20331
My question is how do I go about doing this to a MVC application.
Regards
Basically you want to compare two large sets of results and control variations, in practice, an integration test. I believe that the real traffic can't exactly give you the control that I think you need it.
Before making any change to the production code, you should do the following:
Create X number of random inputs, always using the same random seed, so you can generate always the same set over and over again. You will probably want a few thousand random inputs.
Bombard the class or system under test with these random inputs.
Capture the outputs for each individual random input
When you run it for the first time, record the outputs in a file (or database, etc). From then on, you can start changing your code, run the test and compare the execution output with the original output data you recorded. If they match, keep refactoring, otherwise, revert back your change and you should be back to green.
This doesn't match with your approach. Imagine a scenario in which a user makes a purchase of a certain product, you can not determine the outcome of the transaction, insufficient credit, non-availability of the product, so you cannot trust in your input.
However, what you now need is a way to replicate that data automatically, and the automation of the browser in this case cannot help you much.
You can try a different approach, something like the Lightweight Test Automation Framework or else MvcIntegrationTestFramework which are the most appropriate to your scenario

How to deal with expensive fixture/factory_girl object creation in tests?

For all users in our system, we generate a private/public key pair, which often takes a second or two. That's not a deal breaker on the live site, but it makes running the tests extremely slow, and slow tests won't get run.
Our setup is Rails 3.1 with factory_girl and rspec.
I tried creating (10 or so) some ahead of time with a method to return a random one, but this seems to be problematic: perhaps they're getting cleared out of the database and are unavailable for subsequent tests... I'm not sure.
This might be useful: https://github.com/pcreux/rspec-set - any other ideas?
You can always make a fake key pair for your tests. Pregenerating them won't work, at least not if you store them in the DB, because the DB should get cleared for every test. I suppose you could store them in a YAML file or something and read them from there...
https://github.com/pcreux/rspec-set was good enough for what we need, combined with an after/all block to clean up after the droppings it leaves in the database.

ruby cucumber testing practices

I have many cucumber feature files, each consists of many scenarios.
When run together, some of them fails.
When I run each single test file, they passes.
I think my database is not correctly clean after each scenario.
What is the correct process to determine what is causing this behavior ?
By the sound of it your tests are depening upon one another. You should be trying to get each indervidual test to do what ever set up is required for that indervidual test to run.
The set up parts should be done during the "Given" part of your features.
Personally, to stop the features from becoming verbose and to keep them close to the business language that they where written in, i sometimes add additional steps that are required to do the setup and call them from the steps that are in the feature file.
If this makes sence to you
This happens to me for different reasons and different times.
Sometimes its that a stub or mock is invoked in one scenario that screws up another, but only when they are both run (each is fine alone).
The only way I've been able to solve these is debugging while running enough tests to get a failure. You can drop the debugger line in step_definitions or call it as a step itself (When I call the debugger) and match that up to a step definition that just says 'debugger' as the ruby code.

Resources