Capybara - assert against string containing HTML entities? - ruby-on-rails

In a page on my Rails 6 app, I have table cells rendered from the database, and sometimes the text in them is rendered with converted fancy quotes or other HTML entities encoded by our own t method, which does this:
def t(sanitize = true)
Textile.textilize_without_paragraph_safe(self, false, sanitize)
end
It's used like this:
<span class="current_notes"><%= key.notes.t %></span>
So a stored string like Doesn't, pulled from the db, is converted by Textile to Doesn’t and rendered in the page as Doesn’t. Of course, testing the db value against the output of the page fails, because the quote has been converted.
expected to find visible css "#key_notes_135836989 span.current_notes" with text
"Doesn't get used very often." within #<Capybara::Node::Element> but there were no matches.
Also found "Doesn’t get used very often.", which matched the selector but not all filters.
It also fails if I test against the db value, encoded with t, because that contains the HTML entity, not the special character:
expected to find visible css "#key_notes_135836989 span.current_notes" with text
"Doesn’t get used very often." within #<Capybara::Node::Element> but there were no matches.
Also found "Doesn’t get used very often.", which matched the selector but not all filters.
If I assert against CGI.unescapeHTML(string.t), Capybara finds them equivalent.
assert_selector("#key_notes_#{marys_key.id} span.current_notes",
text: CGI.unescapeHTML(marys_key.notes.t))
My question is, since Capybara is always testing against rendered HTML, it seems like there must be an easier way to do this? I can't imagine using CGI.unescapeHTML every time i have some fancy text on the page (it's everywhere in this app).

Related

Properly Escaping a String inside a View

I've read in multiple places that as of Rails 3 you no longer have to use html_escape "some string" to actually escape a string in a view and that simply writing <%= "some string" %> would escape the string by default. However, I cannot find this information in the docs. I read through the XSS in the Rails guides section that stated this:
https://guides.rubyonrails.org/security.html#cross-site-scripting-xss
As a second step, it is good practice to escape all output of the application, especially when re-displaying user input, which hasn't been input-filtered (as in the search form example earlier on). Use escapeHTML() (or its alias h()) method to replace the HTML input characters &, ", <, and > by their uninterpreted representations in HTML (&, ", <, and >).
Then I see several blogs that state that it is escaped by default. For example: https://www.netsparker.com/blog/web-security/preventing-xss-ruby-on-rails-web-applications/
https://dzone.com/articles/preventing-cross-site-scripting-vulnerabilities-wh
Found it:
https://guides.rubyonrails.org/3_0_release_notes.html
"7.4.3 Other Changes
You no longer need to call h(string) to escape HTML output, it is on by default in all view templates. If you want the unescaped string, call raw(string)."
escapeHTML() (or its alias h()) are from CGI::escapeHTML, which is a Ruby API implementation. If you aren't using Rails you still have a way to escape HTML. Rails may do some automagical handling of HTML in ERB files for display, and that is what you are probably referring to with html_escape "some string" and <%= "some string" %>. I think you are possibly confusing html_escape which you might need when displaying urls and such that are stored in the DB and you want the ERB processor to not mess it up? I know sometimes, particularly in .js.erb files I need to escape some things to get the result I was expecting. This is different than sanitizing. It seems in your example they are referring to something that you might accept and then redisplay, like a search string. If you put <i>hello</i> into a search box you would want to sanitize the input before passing it to the back end, or if you are using some javascript to filter you might want to escape it both for security reasons and to let it re-display correctly in the search box after you've filtered.
Edit: I was not able to find the answer to your comment in the ri doc either. But I tried:
<%= "<b>hello</b>" %>
<%= h("<b>hello</b>") %>
And got the same result in the browser:
<b>hello</b>
<b>hello</b>
So if you are asking if it is true, then I would say yes.

In Rails, which is a better defender against XSS attacks, strip_tags or sanitize?

Assuming no tags are allowed in the user input and we want to sanitize user input before storing it in the database, in Rails, we have the options of using sanitize (whitelist an empty set of tags) and strip_tags.
Which is better against XSS attacks? If something else is even better, what is that? And why is it better?
As of Rails 3 and the fatty beatdown the Rails core dev team took when they made Rails unsafe by default, all strings are now tagged as either safe or unsafe with "unsafe" strings being the default. You only need to think about explicitly managing the "safeness" of strings in Rails when you're writing helpers that output HTML into your template.
Escaping vs Sanitizing:
In this context, escaping means replacing some of the string characters with an HTML escape sequence that will remove the special meaning from the text and cause it render as regular text. Sanitizing on the other hand, means validating the HTML content to ensure only good HTML tags and attributes are used. Note that sanitizing is inherently less secure than escaping because of this and should only be used where rendered content must contain HTML markup. An example would be a WYSIWYG HTML editor on a textarea that manages code that is later rendered on a page.
Sanitize encodes all tags and strips all attributes (not specifically allowed which is all in your case) from the html string passed to it. It also strips href and src tags with invalid protocols to prevent any abuse of js attributes. Strip_tags on the other hand will strip all supplied tags including comments which sounds like exactly what you want. As long as you're whitelisting params and adding them to your DB properly escaped such as:
Title.where(author = ?, author_id)
and not blindly inserting user input into your db I would be comfortable with how you're setup.

Test if text does not contain any HTML

I want to test whether some content does not contain any HTML. What is a simple and clean way to do so?
page.find(".description").should_not have_content /\<.*\>/
Does not work properly, since it fails on <strong>Lorem but passes on <strong>Lorem. Probably due to the way capybara helps its user with escaping HTML.
Solving with xpaths works, but leaves me wondering if there is not a much simpler solution.
page.should_not have_selector(:xpath, "//div[#class="description"]/*")
Is there a built-in way to detect wether some text has been stripped of HTML in Capybara?
Capybara's has_content? (and thus have_content) method aims to inspect text rendered to the user, not text in html source of an element as you expect.
Thus, I think Capybara's behavior is correct:
If html source of .description is <strong>Lorem, user sees <strong>Lorem. Capybara searches this text for /\<.*\>/ and finds it.
If html source of .description is <strong>Lorem, user sees Lorem. Capybara searches this text for /\<.*\>/ and doesn't find it.
If you want to inspect source html of an element, you can use javascript's function innerHTML:
source_text = page.evaluate_script("document.querySelector('.description').innerHTML")
source_text.should_not =~ /\<.*\>/

multi line tag in grails or html

With a grails app and from a local database, I'm returning some text in a xml format.
I can return it well formed in a <textarea></textarea> tag with the correct indenting (tabulation, line return,...etc.)
I want to go a bit further. In the text I'm returning, there are some <img/> tags and I'd like to replace those tag by the real images themselves.
I searched around and found no solution as of now. I understood that you can't add an image to a textarea (other then in a background), and if I choose a div tag, I won't have the indenting anymore (and therefore, harder to read)
I was wondering if using a <g:textField/> or an other tag from the grails library will do the trick. And if so, How can I append them to a page using jquery.
For example, how to append a <g:textField/> in jquery. It doesn't interpret it and I get this error
SyntaxError: missing ) after argument list [Break On This Error]...+doc).append("<input type="text" id="FTMAP_"+nb_sec+"" ...
And in my javascript file, I have
$("#FTM_"+doc).append("<g:textField id='FTMAP_"+nb_sec+"' ... />
Any possible solutions ?
EDIT
I did forget to mention that my final intentions are to be able to modify the text (tags included) and to have a nice and neat indentation so that it is the easiest possible for the end user.
You are asking a few different questions:
1. Can I use a single HTML tag to include images inside pre-formatted text.
No. You will have to parse the text and translate it into styled text yourself.
2. Is there a tag in the grails standard tags to accomplish this for me?
No.
3. How can I add grails tags from my javascript code.
Grails tags are processed on the server-side, and javascript is processed on the client. This means you cannot directly add grails tags via javascript.
There are a couple methods that can accomplish the same result, however:
You can set a javascript variable to the rendered content of a grails tag. This solution is good for data that is known at the time of the initial request.
var tagOutput = "${g.textField(/* etc */)}";
You can make an ajax request for the content to be added. Then your server-side grails code can render the tags you need. This is better for realtime data, or data that will be updated more than once on a single rendered page.

Safely rendering a user's template/view?

I have a model which has a template field. This template is HTML and has variables which get substituted. This template is then converted into a PDF using wicked_pdf.
How should I take the template which the user enters and safely do variable substitution? Allowing it to be an ERB template seems to be setting myself up for some huge security holes. What safe solutions are there?
Edit:
So, for example, I have my template class/model which has two fields, a name and an HTML field. This is a user editable class. There will be specific variables available to the HTML in the template class (Company Name, price, etc.). I am hoping to use a HTML templating system, but since this is user created content, it isn't trusted. Only variable substitution will be done, nothing more.
Rails provides a couple of helper functions, namely hto escape values on display for preventing such behavior.
<%= h #user.name %>
h is an alias of html_escape

Resources