I have a web scraper built to parse html from a website and I'm trying to write tests for it.
The class I'm trying to test receives a Nokogiri HTML object and extracts the required data from it. Now as usual the html can vary, sometimes elements will be missing or whatnot. I need to test these different situations.
So what I'd like to do is make a bunch of html files, each one representing a case with a particular element missing etc. For each html file, I wish to also construct an associated hash of the data I would expect the scraper to extract, assuming it is working correctly.
So I would like to write a test which will iterate over these html files and compare the data extracted by the class being tested against the expected data and report whether or not it is correct.
Any suggestions as to how to do this?
Have a look at the Artifice, fakeweb or webmock gems, which override net/http in order to supply testable results.
Related
I'm not clear of when (or if) I should use multiple Grails encodeAsXXX calls.
This reference says you need to encodeAsURL and then encodeAsJavaScript: http://grailsrocks.com/blog/2013/4/19/can-i-pwn-your-grails-application
It also says you need to encodeAsURL and then encodeAsHTML, I don't understand why this is necessary in the case shown but not all the time?
Are there other cases I should me using multiple chained encoders?
If I'm rendering a URL to a HTML attribute should I encodeAsURL then encodeAsHTML?
If I'm rendering a URL to a JavaScript variable sent as part of a HTML document (via a SCRIPT element) should I encodeAsURL, encodeAsJavaScript then encodeAsHTML?
If I'm rendering a string to a JavaScript variable sent as part of a HTML document should I encodeAsJavaScript then encodeAsHTML?
The official docs - https://docs.grails.org/latest/guide/security.html - don't show any examples of multiple chained encoders.
I can't see how I can understand what to do here except by finding the source for all the encoders and looking at what they encode and what's valid on the receiving end - but I figure it shouldn't be that hard for a developer and there is probably something simple I'm missing or some instructions I haven't found.
FWIW, I think the encoders I'm talking about are these ones:
https://docs.spring.io/spring-framework/docs/current/javadoc-api/org/springframework/web/util/JavaScriptUtils.html#javaScriptEscape-java.lang.String-
https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html#encode(java.lang.String,%20java.lang.String)
https://docs.spring.io/spring/docs/current/javadoc-api/org/springframework/web/util/HtmlUtils.html#htmlEscape-java.lang.String-
.
It is certainly important to always consider XSS but in reading your question I think you are overestimating what you need to do. As long as you're using Grails 2.3 or higher and your grails.views.default.codec is set to html which it will be by default, everything rendered in your GSP with ${} will be escaped properly for you.
It is only when you are intentionally bypassing the escaping, such as if you need to get sanitized user input back into valid JavaScript within your GSP for some reason, that you would need to use the encodeAsXXX methods or similar.
I would argue (and the article makes a mention of this as well) that this should raise a smell anyway, as you probably should have that JavaScript encapsulated in a different file or TagLib where the escaping is handled.
Bottom line, use the encoding methods only if you are overriding the default HTML encoding, otherwise ${} handles it for you.
I am knocking together a quick debugging view of a backend, as a small set of admin HTML pages (driven by angulardart, but not sure that is critical).
I get back from my XHR call a complex JSON object. I want to see that on the HTML page formatted nicely. It doesn't have to be a great implementation, as its just a debug ui, but the goal is to format the object instead of having it be one long string with no newlines.
I looked at trying to pretty print JSON in dart then putting that inside <pre></pre> tags, as well as just dumping the dart Map object to string (again, inside or not inside <pre></pre> tags. But not getting to where I want.
Even searched pub for something similar, such as a syntax highlighter that would output html, but didn't find something obvious.
Any recommendations?
I think what you're looking for is:
Format your JSON so it's readable
Have syntax highlight
For 1 - This can be done with JsonEncoder with indent
For 2 - You can use the JS lib called HighlightJs pretty easily by appending your formatted json into a marked-up div. (See highlightjs' doc to see what I mean)
I'm trying to test specific content inside a pdf document which is being generated via the prawn gem and attached to an ActionMailer email. I'm using RSpec and Capybara for testing.
I've managed to test the filename like this
expect(ActionMailer::Base.deliveries[0].filename).to eq("my_file.pdf")
I thought that I've read somewhere that I have to test the pdf itself like this but it doesn't work.
`expect(ActionMailer::Base.deliveries[0].body.encoded).to have_content(user.first_name)``
When running the test, I get the following error:
Failure/Error: expect(ActionMailer::Base.deliveries[0].body.encoded).to have_content(user.first_name)
expected to find text "John" in "JVBERi0xLjQKJf////8KMSAwIG9iago8PCAvQ3JlYXRvciA8ZmVmZjAwNTAw MDcyMDA2MTAwNzcwMDZlPgovUHJvZHVjZXIgPGZlZmYwMDUwMDA3MjAwNjEw MDc3MDA2ZT4KPj4KZW5kb2JqCjIgMCBvYmoKPDwgL1R5cGUgL0NhdGFsb2cK L1BhZ2VzIDMgMCBSCj4+CmVuZG9iagozIDAgb2JqCjw8IC9UeXBlIC9QYWdl cwovQ291bnQgMQovS2lkcyBbNSAwIFJdCj4+CmVuZG9iago0IDAgb2JqCjw8 IC9MZW5ndGggMzE2NAo+PgpzdHJlYW0KcQpxCi9UcjEgZ3MKMjA2LjAwMCA2 NTYuMDAwIDIwMC4wMDAgMTAwLjAwMCByZQpTClEKCnEKNDAwLjAwMCAwIDAg NTAuMDAwIDEwNi4wMDAgNzA2LjAwMCBjbQovSTEgRG8KUQpxCi9UcjEgZ3MK MjA2LjAwMCA2NTYuMDAwIDIwMC4wMDAgMTAwLjAwMCByZQpTClEKCkJUCjIx OC40ODI1NTg1OTM3NSA2NDIuODk2IFRkCi9GMS4wIDE4IFRmCls8NTQ+IDEx MC44Mzk4NDM3NSA8NjU2OTZjNmU2MTY4NmQ2NTYyNjU3Mzc0OGE3NDY5Njc3 NTZlNjc+XSBUSgpFVAoKCkJUCjM2IDYyMy4zMjk5OTk5OTk5OTk5IFRkCkVU CgoKQlQKMzYgNjAzLjc2Mzk5OTk5OTk5OTkgVGQKRVQKCgpCVAoyNDguODUy
This text continues a long time.
Maybe it's just me, but this doesn't exactly look "testable" to me. Does someone know to do this? Thanks!
Firstly, your PDF-generation code should really be separate from your mailer code, and should have its own tests separate from your mailer tests. Please do this first.
Once you've separated your PDF-generation code and tests you can generate your PDF and test its content using the pdf-inspector gem, which is helpfully maintained by the same folks who make Prawn. Then in your mailer tests you can simply check whether the file is attached, using something like this.
P.S. The reason the email content looks garbled (JVBERi0xLjQ...) is that email attachments are (usually) Base64-encoded. But even if you decoded it, you might not be able to search the PDF content for a plaintext string without a library like pdf-inspector because it might be compressed (I don't know if Prawn does this by default). But really, your PDF code and tests and your email code and tests should be completely separate.
I'm rendering content using Backbone in Rails. Some of the json properties i'm getting from the models will be html attributes, some of them might be used inside the javascript and others will be inserted between html elements. All of these require different escaping mechanisms, how do people deal with this?
In our project we are using doT templates which (as most other) allow for interpolation with encoding ({{! ... }}). You could also try to encode all data and strip any possible javascripts server side when data is saved to be 100% sure you won't get anything malicious
Additionally if you are using jquery methods remember to use text method to insert data rather then html as text will automatically encode it.
And I really recommend the doT! It's super fast and we've managed to make it play really nicely with requirejs
I do have a question concerning the development of a gem for Rails 3.
I would like to insert specific HTML snippets/partials/fragments (e.g. a form or an image on a fixed position) into every HTTP get response. I am wondering what technique would be the most appropriate for this use case. I see two solutions, but I am not sure which one will be the better approach.
Rack middleware loaded by a Rails engine: I could write a rack middleware that parses the response HTML document and insert my HTML snippets/partials before the closing body tag. This approach seems to be a little bit dirty, since the proper formatting of the response document is not guaranteed.
Inserting the snippet at the controller level: Maybe as an after_filter?! The problem here is that I somehow have to guarantee that my after_filter will be the last one in the filter chain.
I am curious whether there are further approaches and which one you would pick. It would be great to have access to the standard Rails view helpers from within the partial/snippet I am planning to insert into the response. By loading the gem the snippet should be automatically included into every response without requiring the user to insert a partial at the views.
Thanks in advance
Peter