An Rspec helper like assert_database_unchanged? - ruby-on-rails

Is there an rspec extension for postgresql that allows one to test something like this?
expect { make_bad_request }.to not_change_database
i.e. To ensure nothing was created, updated or deleted.
Sure, we can check a specific table but most often we just want to be sure that nothing changed at all, nothing sneaked in some multi-stage save.
It's not particularly easy to do with a little helper because although postgres has pg_stat_database it's not updated during the test transaction. I can still see how it's doable but it would be a bit of plumbing. Has anyone done it?
UPDATE:
I was asked to give an example of how this might be useful.
The convention with HTTP is that if a request returns an error status then no change has been made to the application state. Exceptions to that convention are rare and we like convention over configuration.
Active record helps with enforcing this with defaults about how validation works but it still leaves lots of ways to make mistakes, particularly with complex chains of events where it's most important to have atomicity.
As such, to enforce the HTTP convention with ease you could take it even further than as stated above and instead have something like a directive expressed as something like
describe 'error responses', :database_changes_disallowed do
context 'invalid form data' do
before do
...setup belongs only here
end
it 'returns 422' do
...
end
end
end
Rspec is already able to use database transactions to isolate state changes on a per-example level, this would aim to subdivide just one further between the before and the it.
This will work for a well designed app if you have been judicious enough to ensure that your database stores only application state and no pseudo-logging like User#last_active_at. If you haven't then you'll know immediately.
It would greatly increase test case coverage against the some of the worst kind of state corruption bugs whilst needing less code and removing some testing complexity. Cases where a database change suddenly is made for a previously working test would be as a result of an architectural change in an unfortunate direction, a real and unusual need to make an exception or a serious bug.
I'll be sad if it turns out to be technically infeasible to implement but it doesn't seem a bad idea in terms of application design.

That's a tricky one, because it's not easy to tell what should not happen in your app. IMO better to keep the focus of your specs on what the app should do.
In other words: if you want to test that no DB changes were made, should you check that no files were written? And no requests were made? Should you test that no files permissions have been changed?
I guess you get my point.
But, there might be a legit reasons to do it I don't know about. In such case, I'd use something like db-query-matcher
expect { your_code_here }.not_to make_database_queries(manipulative: true)
I usually used and and seen it being used for N+1 tests (when you want to specify how many times a specific query is called), but it seems this matcher would work for you as well.
But it can be very brittle: if you add such checks to most of your tests, and your app is evolving, you can have a failing specs just because some actions started to need a DB update. Your call.

I think you are looking for
assert_no_changes(expressions, message = nil, &block)
https://api.rubyonrails.org/v6.0.2.1/classes/ActiveSupport/Testing/Assertions.html#method-i-assert_no_changes

Related

How to know what exceptions to rescue?

I often find myself without knowing what exceptions to rescue when using a specific library of code in Ruby.
For instance, I often use HTTParty for any HTTP requests my rails/sinatra app would make. I dug around the code for HTTParty and found a file containing the defined exceptions used. Great! I'll just rescue them when making a request.
To test it out, I put in a bogus domain name for the request, but instead of HTTParty::ResponseError exception I expected, I instead get got a SocketError exception.
What is the best way to deal with this? I'm aware that HTTParty is a wrapper for Ruby's implementation, and that's probably what threw the SocketError exception. But how would I know that normally?
I could solve this by just rescuing "Exception", but that's pretty awful practice. I'd rather be well aware of the exceptions I could be causing and dealing with those.
EDIT: I should clarify that what really prompted me to create this question was that I have no idea how I can figure out the possible exceptions that CAN be raised when calling a specific function... that is, without looking through every single function call in the stack.
In general terms (I'm not a ruby programmer) I use the following approach.
I deal with exceptions in the following way:
Can I recover from it? If the exception can happen and I know I can recover or retry perhaps, then I handle the exception.
Does it need to be reported? If the exception can happen but I know I can't recover or retry perhaps, then I handle the exception by logging it and then passing it on to the caller. I always do this on natural subsystem boundary like major module or services. Sometimes (dependant on the API) I might wrap the exception with a 'my module' specific one so that the caller only has deal with my exceptions.
Can't handle it? All exceptions that are not dealt with should be caught at the top level and (a) reported, (b) ensure that the system remains stable and consistent. This is the one that should always be there regardless of whether the other two are done.
Of course there is another class of exception - the ones that are so serious that they give you no chance to deal with them. For these there is only one solution -Post Mortem debugging and I find the best thing for this is logs, logs and more logs. And having worked on many system from small to large, I would prefer to sacrifice performance for stability and recoverability (except where it's critical) and add copious amounts of logging - introspectively if possible.
A socketError response is totally fine if you put in a bogus domain name.
After all - trying to connect to a non-existant domain would cause the connection to fail AKA SocketError.
The best way to deal with that is to use a valid Domain with a false URL in your test, but catch socketError in your live code.
The problem here is not that you're catching the wrong exception but that you're priming the test with bad data.
The best course of action is understand what exceptions could happen and manage them,
When I say understand, I'm getting at - Where does the URL come from, is it entered by your user ? if so never trust it and catch everything. Does it come from your config data; Semi trust it and log errors, unless it's mission critical that the URL is ok.
There's no right or wrong answer here but this approach will, I hope, give you a good result.
Edit: What I'm attempting to do here is advocate the mindset of a programmer that is aware of the results of their actions. We all know that trying to connect to 'thisServiceIsTotallyBogus.somethingbad.notAvalidDomain' will fail; but the mindset of a programmer should be to first validate exactly where that domain comes from. If it is inputted by the user then you must assume full checks; if you know it's from a config file only accessed by yourself or your support team; you can relax a little; Sadly though, this is a bad example as you should really always test URLS because sometimes the internet doesn't work!
Ideally, the developer documentation for anything you use should tell you what exceptions that it can throw.
For libraries or gems where the source code is publicly available, you can typically find the types of exceptions in an exceptions.rb file. (Ref here). Otherwise you will have to rely on the documentation. If all else fails you can rescue StandardError, although it is a less-than-ideal practice in many cases (ref this SO answer.

Why don't people access database in Rspec?

I often see the code which uses mock in Rspec, like this:
describe "GET show" do
it "should find and assign #question" do
question = Question.new
Question.should_receive(:find).with("123").and_return(question)
get :show, :id => 123
assigns[:question].should == question
end
end
But why they don't add a Question with id => 123 in database, retrieve it by get, and destroy it? Is this a best practice? If I don't follow the rule, will something bad happen?
When you write a behavioral test (or a unit test), you're trying to test only a specific part of code, and not the entire stack.
To explain this better, you are just expressing and testing that "function A should call function B with these parameters", so you are testing function A and not function B, for which you provide a mock.
This is important for a number of reasons:
You don't need a database installed on every machine you build your code, this is important if you start using build machines (and/or continuous integration) in your company with hundreds of projects.
You get better test results, cause if function B is broken, or the database is not working properly, you don't get a test failure on function A.
Your tests run faster.
It's always a pain to have a clean DB before each test. What if a previous run of your tests was stopped, leaving on the database a Question with that id? You'll probably get a test failure because of duplicate id, while in reality the function is working properly.
You need a proper configuration before running your test. This is not such an incredible problem, but it's much better if tests can run "out of the box", without having to configure a database connection, a folder of temporary test files, an SMTP server for testing email stuff, etc...
A test that actually test the entire stack is called "end to end testing", or "integration testing" (depending on what it tests). These are important as well, for example a suite of tests without mock database can be used to see if a given application can run safely of a different DB than the one used during development, and eventually fix functions that contain offending SQL statements.
Actually, many people do, including me. Generally speaking, since tests are there to check behavior, it can feel a bit unnatural to insert database entries to some people.
Question.new would be enough because it goes through the valid methods of rails anyway, so many people tend to use them, also because they are faster.
But, indeed, even if you start using factories, there will be times that you will probably inserting data to your testing environment as well. I personally don't see anything wrong with this.
Overall, in some situations were the testing suite is really large, it can be quite an advantage not to save database entries. But if speed is not your top concern, i would say that you would not really have to worry on how the test looks, as long as it is well constructed and to the point.
BTW, you do not need to destroy test data, it's done automatically after the test ends. So, unless you are checking on the actual delete methods, avoid doing that explicitly.

How to fail gracefully and get notified if screen scraping fails in ruby on rails

I am working on a Rails 3 project that relies heavily on screen scraping to collect data mainly using Nokogiri. I'm aggregating essentially all the same data but I'm grabbing it from many difference sources and as time goes on I will be adding more and more. However I am acutely aware that screen scraping can be notoriously unreliable.
As such I am interested in how other people have handled the problem of verifying the data and then also getting notified if it is failing.
My current plan is as follow.
I am going to have validation on my model for most of the fields. If they fail I won't get bad data into my system. Although logging this failure in a meaningful way is still a problem.
I was thinking of some kind of counter where after so many failures from a particular source I somehow turn it off. Not sure how to keep track of that. I guess the only way is to have a field on my Source model that counts it and can be reset.
Logging is 800 pound gorilla I'm not sure how to deal with. I could just do standard writing to logs but if something fails I'd like to store the entire html so I can figure it out. Also I need to notify myself somehow so I can address the issues. I thought of maybe just creating a model for all this and storing it in the database. If I did this I'd probably have to store the html on s3 or something. I'm running this on heroku so that influences what I can do.
Setup begin and rescue blocks around every field. I was trying to figure out a to code this in a nicer ruby way so I just don't have a page of them but although I do have some fields are just straight up doc.css_at("#whatever") there are quite a number that require various formatting or calculations so I think it makes sense to try to rescue that so I can then log what went wrong. The other option is to let the exception bubble up and catch it when I try to create the model.
Anyway I'm sure I'm not even thinking of everything but that is why I'm trying to figure out how other people have handled this problem.
Our team does something similar to this, so here's some ideas:
we use a really high level begin/rescue transaction to make sure we don't get into weird half loaded states:
begin
ActiveRecord::Base.transaction do
...try to load a data source...
end
rescue
...error handling...
end
Email/page yourself when certain errors occur. We use exception_notifier but if you're sitting on Heroku the Exceptional plugin also seems like a good option. I've also heard of people having success w/ hoptoad
Capturing state is VERY important for troubleshooting issues. Something that's worked quite well for us is GMail. Our loaders effectively have two phases:
capture data and send it to our gmail account
log into gmail, download latest data and parse it
The second phase is the complex one, and if it fails a developer can simply log into the gmail account and easily inspect the failed message. This process has some limitations (per email and per mailbox storage limits, two phase pipeline, etc.) and we started out doing it because we had no other option, but it's proven shockingly resilient and convenient. Keep email in mind as a cheap/easy way to store noncritical state. We didn't start out thinking of using it that way and are now really glad we do. Logging into GMail feels better than digging through log files.
Build a dashboard UI. We have a simple dashboard with a grid of sources by day that looks like this. Each box is colored either red or green based on whether the load for that source on that day succeeded. You can go one step further and set up a monitor on this UI (mon.itor.us or equivalent) that alarms if some error threshold is met.

Storing Objects in a Session in Rails

I have always been taught that storing objects in a session was a bad idea. Instead IDs should be stored that retrieve the record when needed.
However, I have an application that I wonder is an exception to this rule. I'm building a flashcard application, and the words being quizzed are in a table in the database whose schema doesn't change. I want to store the words currently being quizzed in a session, so a user can finish where they started in case they move on to a separate page.
In this case, is it possible to get away with storing these words as objects in the database? If so, why? The reason I ask is because the quiz is designed to move quickly, and I'd hate to waste a database call on retrieving a record that never changes in the first place. However, perhaps there are other negatives to a large session that I'm not aware of.
*For the record, I have tried caching it with the built-in memcache methods in Rails 2.3, but apparently that has a maximum size per item of 1MB.
The main reason not to store objects in the session is that if the object structure changes, you will get an exception. Consider the following:
class Foo
attr_accessor :bar
end
class Bar
end
foo = Foo.new
foo.bar = Bar.new
put_in_session(foo)
Then, in a subsequent release of the project, you change Bar's name. You reboot the server, and try to grab foo out of the session. When it tries to deserialize, it fails to find Bar and explodes.
It might seem like it would be easy to avoid this pitfall, but in practice, I've seen it bite a number of people. This is just because serializing an object can sometimes take more along with it than is immediately apparent (this sort of thing is supposed to be transparent) and unless you have rigorous rules about this, things will tend to get flummoxed up.
The reason it's normally frowned upon is that it's extremely common for this to bite people in ActiveRecord, since it's quite common for the structure of your app to shift over time, and sessions can be deserialized a week or longer after they were originally created.
If you understand all that and are willing to put in the energy to be sure that your model does not change and is not serializing anything extra, you're probably fine. But be careful :)
Rails tends to encourage RESTful design, and using sessions isn't very RESTful. I'd probably make a Quiz resource that has a bunch of words, as well as a current_word. This way, when they come back, you'll know where they were.
Now, REST isn't everything (depending on who you talk to), but there's a pretty good case against large sessions. Remember that sessions write things to and from disk, and the more data that you're writing, the longer it takes to read back...
Since your app is a Rails app, I would suggest either:
Using your clients' ability to cache
by caching the cards in javascript.
(you'd need a fairly ajaxy app to
do this, see the latest RailsCast for some interesting points on javascript page caching)
Use one of the many other rails-supported server-side
caching options (i.e. MemCached) to
cache this data.
A much more insidious issue you'll encounter storing objects directly in the session is when you're using CookieStore (the default in Rails 2+ I believe). It's very easy to get CookieOverflow errors which are very hard to recover from.

When I have required model relationships, how do I guard against errors?

I have an application with a lot of database relationships that depend on each other to successfully operate the application. The hinge in the application is a model called the Schedule, but the schedule will pull Blocks, an Employee, a JobTitle, and an Assignment (in addition to that, every Block will pull an assignment from the database along with it as well) to assemble an employees schedule throughout the day.
When I built the app, I put a lot of emphasis on validations that would ensure that all of the pieces had to be in place before everything was saved to the database. This has worked out fantastically so far, and the app has been live and pounded on for almost 6 months, serving approximately 150,000 requests a month with no hiccups or errors. Until last week.
Last week, while someone was altering a schedule, it looks like the database erred, and a Schedule was saved to the database with its Assignment missing. Because the association is called in every view, whenever this schedule was called from the database, the application would throw an NoMethod error for calling on nil.
When designing an application in the way that I state, do you guard against a possible failure on the part of the database/validations? And if so how do you programatically defend against it? Do you check every relationship to make sure that it is not nil before sending it to the view?
I know this question is awash in generality, and if I can be more specific in what I mean, please let me know in the comments.
I would recommend adding database-enforced foreign key constraints and wrapping important groups of operations into transactions.
If there is a foreign-key between Schedule and Assignment somewhere, a database-enforced foreign key constraint would have prevented the errant insert. Additionally, if you wrap the particular action in a transaction, you can be sure that either the entire stream of inserts/updates/deletes happens or fails, reverting to a clean state.
In addition to your validations, and adding some database constraints as mentioned in other answers, you might also run a background job that periodically sweeps the database looking for orphans.
When it finds one, it cleans it up (if possible), or deletes it, or just marks it inactive and sends you email so you can look at it later. Depending on the amount and nature of your data, once a minute, once an hour, once a day...
That way, if bad data does get in despite whatever safeguards you have in place, you'll know about it sooner rather than later.
I'll argue the non-conventional wisdom on this. The constraints you describe don't belong in the database, they belong in your OO code. And it's not true that "the database erred", it's unquestionably true that the application is what inserted improperly validated data.
When you start expecting the database to carry the burden of these checks, you're putting business rules into the schema. At a minimum, this makes it a lot harder to write unit tests (which is where you should probably have caught this in the first place; but now is your chance to add another test.)
Ideally, you should be able to replace the RDBMS with some other generic data store and still have all the functional logic properly active and unchanged in the appropriate other places. The UI shouldn't be talking to the DAL much less dealing with database exceptions directly.
You can add the additional database constraints if you want, but it should be strictly as a backup. As you can see, handling database structural errors gracefully (especially if the UI is involved) is a lot harder.
If it's something that must be true in order for the app to function, that's really what assert()s are for. I've barely ever used Ruby, but I imagine it must have that concept. Use them to enforce preconditions in various places throughout your code. That combined with sanitizing and validating your external (user) inputs should be enough to protect you. I think if something goes wrong after that amount of checking, your app is righteously allowed to crash (in a controlled manner, of course).
I doubt the problem you're experiencing is a bug in your database. More likely there's some edge case in your validations that you've overlooked.

Resources