What is a systematic approach to debug intermittently failing specs? - ruby-on-rails

I have four tests in my Capybara/Rspec suite that keep failing (a real problem for CI deployment).
The worst thing, these tests are failing intermittently, and often only when the entire suite is run, making it difficult to debug.
They are all ajax requests, either submitting a remote form or clicking a remote link, followed by expect(page).to have_content 'My Flash Message'.
These tests even fail intermittently within the same test cycle. For example, I have several models that behave similarly, so I am iterating through them to test.
e.g.,
['Country', 'State', 'City'].each do |object|
let(:target) { create object.to_sym }
it 'runs my frustrating test' do
end
end
Sometimes country fails, sometimes state, sometimes everything passes.
I have tried adding wait: 30 to the expect statement. I have tried adding sleep 30 before the expect statement. I'm still getting intermittent passes.
There is quite a bit of information out there describing finicky ajax tests, but I have not found much about how to debug and fix these kinds of problems.
I'm really grateful for any advise or pointers from others, before I pull all my hair out!!
UPDATE
Thank you for all these excellent responses. It's been useful to see that others have grappled with similar issues, and that I'm not alone.
So, is there a solution?
The suggestions to use debugging tools such pry, byebug, Poltergeist's debug feature (thanks #Jay-Ar Polidario, #TomWalpole) have been useful to confirm what I thought I already knew — namely, and as suggested by #BM5K) that the features work consistently in the browser, and the errors lie within the tests.
I experimented with with adjusting timeouts and retries (#Jay-Ar Polidario, #BM5K), and while an improvement these were still not a consistent fix. More importantly, this approach felt like patching holes rather than a proper fix, and so I was not entirely comfortable.
Ultimately I went with a major rewrite of these tests. This has entailed breaking up multi-step features, and setting up and testing each step individually. While purists may claim this is not truly testing from the user's perspective, there is sufficient overlap between each test that I'm comfortable with the result.
In going through this process, I did notice that all of these errors were related to "clicking on things, or filling forms", as #BoraMa suggested. Though in this case the experience was reversed — we had adopted .trigger('click') syntax because capybara + poltergeist was reporting errors clicking on elements using click_link or find(object).click, and it was these tests that were problematic.
To avoid these problems I've removed JS from the tests as much as possible. i.e., testing the majority of the feature without JS enabled, and then creating very short, targeted JS specs to test specific JS responses, features or user feedback.
So there is not really one single fix. A major refactoring that, in all honesty, probably needed to happen and was a valuable exercise. The tests have lost some features by by breaking everything up into individual tests, but as a whole this has made the tests easier to read and maintain.
There are still a couple of tests that are occasionally showing red, and will need some more work. But overall a great improvement.
Thank you all for the great guidance, and reassuring me that interactions in the testing environment could be the root cause.

Let me bring out story too :). Recently, we also tried to hunt and fix the issues with our intermittently failing tests under similar setup (Poltergeist, JS tests). The tests failed more probably when the whole test suite was run than individually but in about one third of time the whole suite succeeded. It was just a couple of tests from the suite, about 10, that randomly failed, others seemed to run OK all the time.
First we made sure the tests were not failing due to db truncation issues, leftover records etc. We made screenshots upon the moment of failure to verify that the page looked correct.
After a lot more searching we noticed that all of the remaining failing tests deal with clicking on things, or filling forms while there were jQuery animations and other dynamic operations frequently used on the pages. This lead us to this Poltergeist issue which helped us greatly in the end. It turns out that Poltergeist, when clicking on a button or dealing with form inputs, tries to maximally mimic the normal user which can lead to problems when the inputs / links are animated.
A way to recognize that this was indeed an issue for us was that we could successfully find the element on the page but the browser was unable to click on it.
We ended up using a not very clean solution - we have rewritten some capybara helpers for clicking and interacting with forms to use find and trigger internally:
# override capybara methods as they react badly with animations
# (click/action is not registered then and test fails)
# see https://github.com/teampoltergeist/poltergeist/issues/530
def click_button(locator, *options)
find_button(locator, *options).trigger(:click)
end
def click_link(locator, *options)
find_link(locator, *options).trigger(:click)
end
def choose(locator, *options)
find(:radio_button, locator, *options).trigger(:click)
end
def check(locator, *options)
find(:checkbox, locator, *options).trigger(:click)
end
This approach may lead to some unexpected problems because now you'll be able to click on things in your tests even if they are e.g. overlapped by a modal div or when they are not fully visible on the page. But after reading carefully the comments on the github issue, we decided that this was the way to go for us.
Since then, we have only very occasional test failures which seem to be related to another Poltergeist timeouts issue. But the failures are so rare that we don't feel the urge to look further - the tests are finally reliable enough.

Intermittently failing tests are a pain to troubleshoot, but there are some things you can do to make life easier. First would be to remove any looping or shared examples. Explicitly stating each expectation should make it more clear which example combination is failing (or make it even more obvious that it is indeed random).
Over the course of several runs, track which tests are failing. Are they all in the same context group?
Are you mixing and matching javascript tests and non-javascript tests? If you are, you may be running into database issues (I've seen problems caused by switching database cleaner strategies mid context block).
Make sure you consider any parent context blocks the tests are in.
And if none of that narrows down your search, use a gem that allows you to retry failing tests.
I used respec-retry in the past, but have found it to be unreliable lately. I've switched to rspec-repeat. I usually leave these off in development (configured for 1 try) and run with multiple tries on CI (usually 3). That way I can get a feel for which tests are wobbly locally, but not let those tests break my build (unless they fail consistently).
TL;DR
Most of the intermittently failing tests I encounter have a lot of moving pieces (rails, capybara, database cleaner, factory girl, phantomjs, rspec just to name a few). If the code is tested AND the specs frequently pass AND the feature consistently works in the browser chances are some interaction in your testing environment is the root cause of the intermittent failures. If you can't track that down, retry the failing specs a couple of times.

If you are sure that there is no changing variable in both server (Rails), and client (JS) side. You may try the following if it will work. We used this for some similar problem we had.
spec/support/wait_for_ajax.rb
# ref: https://robots.thoughtbot.com/automatically-wait-for-ajax-with-capybara
module WaitForAjax
def wait_for_ajax
Timeout.timeout(Capybara.default_max_wait_time) do
loop until finished_all_ajax_requests?
end
sleep(1) # ensure just because above doesn't always work
end
def finished_all_ajax_requests?
page.evaluate_script('jQuery.active').zero?
end
end
spec/features/YOUR_SPEC.rb
Rspec.feature 'My Feature Test', type: :feature do
['Country', 'State', 'City'].each do |object|
let(:target) { create object.to_sym }
it 'runs my frustrating test' do
find('#my-div').click
wait_for_ajax
end
end
end
rails_helper.rb
# ..
RSpec.configure do |config|
# ..
config.include WaitForAjax, type: :feature
# ..
end
# ..

Related

An Rspec helper like assert_database_unchanged?

Is there an rspec extension for postgresql that allows one to test something like this?
expect { make_bad_request }.to not_change_database
i.e. To ensure nothing was created, updated or deleted.
Sure, we can check a specific table but most often we just want to be sure that nothing changed at all, nothing sneaked in some multi-stage save.
It's not particularly easy to do with a little helper because although postgres has pg_stat_database it's not updated during the test transaction. I can still see how it's doable but it would be a bit of plumbing. Has anyone done it?
UPDATE:
I was asked to give an example of how this might be useful.
The convention with HTTP is that if a request returns an error status then no change has been made to the application state. Exceptions to that convention are rare and we like convention over configuration.
Active record helps with enforcing this with defaults about how validation works but it still leaves lots of ways to make mistakes, particularly with complex chains of events where it's most important to have atomicity.
As such, to enforce the HTTP convention with ease you could take it even further than as stated above and instead have something like a directive expressed as something like
describe 'error responses', :database_changes_disallowed do
context 'invalid form data' do
before do
...setup belongs only here
end
it 'returns 422' do
...
end
end
end
Rspec is already able to use database transactions to isolate state changes on a per-example level, this would aim to subdivide just one further between the before and the it.
This will work for a well designed app if you have been judicious enough to ensure that your database stores only application state and no pseudo-logging like User#last_active_at. If you haven't then you'll know immediately.
It would greatly increase test case coverage against the some of the worst kind of state corruption bugs whilst needing less code and removing some testing complexity. Cases where a database change suddenly is made for a previously working test would be as a result of an architectural change in an unfortunate direction, a real and unusual need to make an exception or a serious bug.
I'll be sad if it turns out to be technically infeasible to implement but it doesn't seem a bad idea in terms of application design.
That's a tricky one, because it's not easy to tell what should not happen in your app. IMO better to keep the focus of your specs on what the app should do.
In other words: if you want to test that no DB changes were made, should you check that no files were written? And no requests were made? Should you test that no files permissions have been changed?
I guess you get my point.
But, there might be a legit reasons to do it I don't know about. In such case, I'd use something like db-query-matcher
expect { your_code_here }.not_to make_database_queries(manipulative: true)
I usually used and and seen it being used for N+1 tests (when you want to specify how many times a specific query is called), but it seems this matcher would work for you as well.
But it can be very brittle: if you add such checks to most of your tests, and your app is evolving, you can have a failing specs just because some actions started to need a DB update. Your call.
I think you are looking for
assert_no_changes(expressions, message = nil, &block)
https://api.rubyonrails.org/v6.0.2.1/classes/ActiveSupport/Testing/Assertions.html#method-i-assert_no_changes

Sucker punch tests in rails, using connection_pool block, results in connection timeout

Thanks in advance for your kind response.
At work we are using sucker punch gem for a rails app to send emails and other stuff we want to do asynchronously.
We implemented a couple of actors with no problems and even wrote some tests for them successfully, using the recommended configuration for that matter (requiring sucker_punch/testing/inline in the specs and using truncation as database cleaning strategy).
Everything was working like a charm, until the last actor that we decided to implement. It has nothing different than the others, but now, when running the test suite, ActiveRecord::ConnectionTimeoutError is raised.
I've searched the internet for a solution but nothing came up. Most of the answers (like this one) would suggest to use ActiverRecord::Base.connection_pool.with_connection method passing a block to it. We were already doing that.
The only thing that I can think of is that we are handling errors on the actors, rescuing exceptions, like this:
def perform
ActiveRecord::Base.connection_pool.with_connection do
begin
... # do something
rescue SomeException => e
... # handle exception
end
end
end
But looking at source this souldn't be a problem since with_connection has an ensure to release it.
I will be opening an issue on sucker punch and will be updating this question if I have some news.
The release in question can wait, but this also makes me wonder if we are having this same issue in production...
Cheers,
Aldana.
EDIT
The author of the gem told me that apparently there was nothing wrong with the code, and suggested to increase the pool size. I'm going to use this approach and if the error persists we will change some parts of the code not to use sucker punch.

Getting inconsistent "Unable to find css" errors with Rspec + Capybara + Ember

What's Happening
In our Rspec + Capybara + selenium (FF) test suite we're getting A LOT of inconsistent "Capybara::ElementNotFound" errors.
The problem is they only happen sometimes. Usually they won't happen locally, they'll happen on CircleCi, where I expect the machines are much beefier (and so faster)?
Also the same errors usually won't happen when the spec is run in isolation, for example by running rspec with a particular line number:42.
Bare in mind however that there is no consistency. The spec won't consistently fail.
Our current workaround - sleep
Currently the only thing we can do is to litter the specs with 'sleeps'. We add them whenever we get an error like this and it fixes it. Sometimes we have to increase the sleep times which is making out tests very slow as you can imagine.
What about capybara's default wait time?
Doesn't seem to be kicking in I imagine as the test usually fails under the allocated wait time (5 seconds currently)
Some examples of failure.
Here's a common failure:
visit "/#/things/#{#thing.id}"
find(".expand-thing").click
This will frequently result in:
Unable to find css ".expand-thing"
Now, putting a sleep in between those two lines fixes it. But a sleep is too brute force. I might put a second, but the code might only need half a second.
Ideally I'd like Capybara's wait time to kick in because then it only waits as long as it needs to, and no longer.
Final Note
I know that capybara can only do the wait thing if the selector doesn't exist on the page yet. But in the example above you'll notice I'm visiting the page and the selecting, so the element is not on the page yet, so Capybara should wait.
What's going on?
Figured this out. SO, when looking for elements on a page you have a few methods available to you:
first('.some-selector')
all('.some-selector') #returns an array of course
find('.some-selector')
.first and .all are super useful as they let you pick from non unique elements.
HOWEVER .first and .all don't seem to auto-wait for the element to be on the page.
The Fix
The fix then is to always use .find(). .find WILL honour the capybara wait time. Using .find has almost completely fixed my tests (with a few unrelated exceptions).
The gotcha of course is that you have to use more unique selectors as .find MUST only return a single element, otherwise you'll get the infamous Capybara::Ambiguous exception.
Ember works asynchronously. This is why Ember generally recommends using Qunit. They've tied in code to allow the testing to pause/resume while waiting for the asynchronous functions to return. Your best bet would be to either attempt to duplicate the pause/resume logic that's been built up for qunit, or switch to qunit.
There is a global promise used during testing you could hook up to: Ember.Test.lastPromise
Ember.Test.lastPromise.then(function(){
//continue
});
Additionally visit/click return promises, you'll need some manner of telling capybara to pause testing before the call, then resume once the promise resumes.
visit('foo').then(function(){
click('.expand-thing').then(function(){
assert('foobar');
})
})
Now that I've finished ranting, I'm realizing you're not running these tests technically from inside the browser, you're having them run through selenium, which means it's not technically in the browser (unless selenium has made some change since last I used it, possible). Either way you'll need to watch that last promise, and wait on it before you can continue, testing after an asynchronous action.

Why don't people access database in Rspec?

I often see the code which uses mock in Rspec, like this:
describe "GET show" do
it "should find and assign #question" do
question = Question.new
Question.should_receive(:find).with("123").and_return(question)
get :show, :id => 123
assigns[:question].should == question
end
end
But why they don't add a Question with id => 123 in database, retrieve it by get, and destroy it? Is this a best practice? If I don't follow the rule, will something bad happen?
When you write a behavioral test (or a unit test), you're trying to test only a specific part of code, and not the entire stack.
To explain this better, you are just expressing and testing that "function A should call function B with these parameters", so you are testing function A and not function B, for which you provide a mock.
This is important for a number of reasons:
You don't need a database installed on every machine you build your code, this is important if you start using build machines (and/or continuous integration) in your company with hundreds of projects.
You get better test results, cause if function B is broken, or the database is not working properly, you don't get a test failure on function A.
Your tests run faster.
It's always a pain to have a clean DB before each test. What if a previous run of your tests was stopped, leaving on the database a Question with that id? You'll probably get a test failure because of duplicate id, while in reality the function is working properly.
You need a proper configuration before running your test. This is not such an incredible problem, but it's much better if tests can run "out of the box", without having to configure a database connection, a folder of temporary test files, an SMTP server for testing email stuff, etc...
A test that actually test the entire stack is called "end to end testing", or "integration testing" (depending on what it tests). These are important as well, for example a suite of tests without mock database can be used to see if a given application can run safely of a different DB than the one used during development, and eventually fix functions that contain offending SQL statements.
Actually, many people do, including me. Generally speaking, since tests are there to check behavior, it can feel a bit unnatural to insert database entries to some people.
Question.new would be enough because it goes through the valid methods of rails anyway, so many people tend to use them, also because they are faster.
But, indeed, even if you start using factories, there will be times that you will probably inserting data to your testing environment as well. I personally don't see anything wrong with this.
Overall, in some situations were the testing suite is really large, it can be quite an advantage not to save database entries. But if speed is not your top concern, i would say that you would not really have to worry on how the test looks, as long as it is well constructed and to the point.
BTW, you do not need to destroy test data, it's done automatically after the test ends. So, unless you are checking on the actual delete methods, avoid doing that explicitly.

ruby cucumber testing practices

I have many cucumber feature files, each consists of many scenarios.
When run together, some of them fails.
When I run each single test file, they passes.
I think my database is not correctly clean after each scenario.
What is the correct process to determine what is causing this behavior ?
By the sound of it your tests are depening upon one another. You should be trying to get each indervidual test to do what ever set up is required for that indervidual test to run.
The set up parts should be done during the "Given" part of your features.
Personally, to stop the features from becoming verbose and to keep them close to the business language that they where written in, i sometimes add additional steps that are required to do the setup and call them from the steps that are in the feature file.
If this makes sence to you
This happens to me for different reasons and different times.
Sometimes its that a stub or mock is invoked in one scenario that screws up another, but only when they are both run (each is fine alone).
The only way I've been able to solve these is debugging while running enough tests to get a failure. You can drop the debugger line in step_definitions or call it as a step itself (When I call the debugger) and match that up to a step definition that just says 'debugger' as the ruby code.

Resources