The PHP HTMLPurifier library, but for Rails? - ruby-on-rails

Anyone who's done anything much with PHP and receiving rich-text input from something like TinyMCE has (probably) used something like HTMLPurifier to keep the nasties out of the HTML you're intentionally allowing the user to submit.
For example, HTMLPurifier will take a string of (potentially malformed) HTML and strip out disallowed elements and attributes, try to fix broken HTML, and in some cases convert things like <i> to <em>.
Does anything equivalent exist for Rails (3)? What's the generally accepted way to sanitize input from rich text editors in Rails so that you can output the unescaped HTML onto a web page and know that stuff like <style> and <script> tags have been taken out of it and it's not going to break your page (or steal your cookies!)?
EDIT | Anybody used Sanitize? Any other options with pro's & con's?

You can use the sanitize method.
sanitize(html)
There is also a Sanitize gem.
Sanitize.clean(html)
I tend to prefer the Sanitize gem because it can be used as a before_save filter in your models instead of having to use the sanitize method in each of your views.

Related

rails 5.x: add nofollow to all links in 'sanitize'

I am working on a Rails application whose HAML templates frequently make use of a routine called sanitize. I have deduced from context that this routine sanitizes user-controlled HTML. Example:
# views/feed_items/_about.html.haml
%h3 Summary:
.description
= sanitize #feed_item.description
I want to make this routine add 'rel=nofollow' to all outbound links, in addition to what it's already doing. What is the most straightforward way to do that?
N.B. I am not having any luck finding the definition of this method, or the official configuration knobs for it. The vendor directory has two different HTML sanitizer gems in it and I can't even figure out which one is being used. This is a large, complicated web application that I did not write, and I barely understand Ruby, let alone all of Rails' extensions to it. Please assume I do not know any of the things that you think are obvious.
The sanitizer will strip out the rel tags if they exist.
I ran into a similar issue and added an additional helper method - clean_links to the ApplicationHelper module, and called it after sanitizing the content.
# application_helper.rb
def clean_links html
html.gsub!(/\\2')
html.html_safe
end
This method looks for all <a> tags, and adds rel="nofollow". The html_safe method is necessary or else the HTML will be displayed as a string (it's already been sanitized).
This solution treats all links equally, so if you only want this for links pointing outside the domain, you'll have to update the REGEX accordingly.
In your view: <%= clean_links sanitize(#something) %>
So, first the content is sanitized, then you add the rel="nofollow" tag before displaying the link.
Actually there's a built-in way:
sanitize "your input", scrubber: Loofah::Scrubbers::NoFollow.new

In Rails, which is a better defender against XSS attacks, strip_tags or sanitize?

Assuming no tags are allowed in the user input and we want to sanitize user input before storing it in the database, in Rails, we have the options of using sanitize (whitelist an empty set of tags) and strip_tags.
Which is better against XSS attacks? If something else is even better, what is that? And why is it better?
As of Rails 3 and the fatty beatdown the Rails core dev team took when they made Rails unsafe by default, all strings are now tagged as either safe or unsafe with "unsafe" strings being the default. You only need to think about explicitly managing the "safeness" of strings in Rails when you're writing helpers that output HTML into your template.
Escaping vs Sanitizing:
In this context, escaping means replacing some of the string characters with an HTML escape sequence that will remove the special meaning from the text and cause it render as regular text. Sanitizing on the other hand, means validating the HTML content to ensure only good HTML tags and attributes are used. Note that sanitizing is inherently less secure than escaping because of this and should only be used where rendered content must contain HTML markup. An example would be a WYSIWYG HTML editor on a textarea that manages code that is later rendered on a page.
Sanitize encodes all tags and strips all attributes (not specifically allowed which is all in your case) from the html string passed to it. It also strips href and src tags with invalid protocols to prevent any abuse of js attributes. Strip_tags on the other hand will strip all supplied tags including comments which sounds like exactly what you want. As long as you're whitelisting params and adding them to your DB properly escaped such as:
Title.where(author = ?, author_id)
and not blindly inserting user input into your db I would be comfortable with how you're setup.

Rails 3: Permitting Users to Use Basic HTML Tags

Throughout my site, users can leave comments. I want them to be able to insert basic HTML in their comments, including bold, italic, and link tags. Unfortunately, Rails automatically escapes all user-generated HTML.
I can bypass this behavior by calling .html_safe, but then I leave my site vulnerable to XSS. Is there a way to permit bold, italic, and link tags, while still escaping other content?
You can use something like markdown to support formatting via alternative (not html directly) means. Markdown can be supported via a number of rubygems, including Redcarpet, markitup, etc. Markdown creates an alternative syntax for bold/italics etc (like bbcode).
https://github.com/jwigal/markitup_rails
You can also use a whitelisting sanitizer like Loofah - https://github.com/flavorjones/loofah/. Loofah is a higher end solution, supporting any html tags you want. The users will submit HTML, then Loofah will read it, and build an html node tree using nokogiri. Then it traverses the tree making sure all tag nodes use whitelisted html tags, allowing you to allow any mix of tags you want, including <a>, <img>, <table> etc. It is highly configurable.
Loofah also checks the attributes (depending on configuration), to make sure nothing is hidden in forbidden attributes like onclick=""

trouble using tinymce using ruby on rails

I am having trouble in using tinymce editor with rails 3. I want to show text in bold letters and having trouble using tags like when I write something in p tags It should go to next paragraphs. in my case this tags is not working. It remains on same lines and display p tags on site page.
The usual suspect when it comes to rails 3 printing raw html output to the site, is that someone forgot to call html_safe on whatever text should be printed.
So if you have a #my_model_instance.description that you edit with tinymce, you might want to make the view look like #my_model_instance.description.html_safe, or as they suggest in the comment on the documentation, raw(#my_model_instance.description).
If the text is coming from user input, however, you might want to be a bit cautious, since it might be possible for users to input all sorts of nasty injection hacks this way.

rails 3 internationalization / localization - embeddings links in translated text

I need to embed links in my translated texts. I followed this post, but it doesn't seem to work in rails 3 anymore as the html tags don't get rendered properly.
Anyone knows how to get this done in rails 3?
Update:
Apparently, the html tags can be escaped by using the html_safe method. But does anyone know if there's another way to solve this problem without using html_safe?
I would like to avoid unescaping my html tags if possible, b/c I've encountered a situation where I have to pass in a text field into my translation, and I would like to avoid unescaping any strings that are user inputted.
Change {{url}} to %{url} and you should be good to go.
Update
Ok, thanks, that's important information about what "doesn't work" means :) So, you need to call the html_safe method on your call to link_to, eg.
link_to(t("log_in_href"), login_path).html_safe
This will tell Rails to render the HTML, not escaped.

Resources