rails 5.x: add nofollow to all links in 'sanitize' - ruby-on-rails

I am working on a Rails application whose HAML templates frequently make use of a routine called sanitize. I have deduced from context that this routine sanitizes user-controlled HTML. Example:
# views/feed_items/_about.html.haml
%h3 Summary:
.description
= sanitize #feed_item.description
I want to make this routine add 'rel=nofollow' to all outbound links, in addition to what it's already doing. What is the most straightforward way to do that?
N.B. I am not having any luck finding the definition of this method, or the official configuration knobs for it. The vendor directory has two different HTML sanitizer gems in it and I can't even figure out which one is being used. This is a large, complicated web application that I did not write, and I barely understand Ruby, let alone all of Rails' extensions to it. Please assume I do not know any of the things that you think are obvious.

The sanitizer will strip out the rel tags if they exist.
I ran into a similar issue and added an additional helper method - clean_links to the ApplicationHelper module, and called it after sanitizing the content.
# application_helper.rb
def clean_links html
html.gsub!(/\\2')
html.html_safe
end
This method looks for all <a> tags, and adds rel="nofollow". The html_safe method is necessary or else the HTML will be displayed as a string (it's already been sanitized).
This solution treats all links equally, so if you only want this for links pointing outside the domain, you'll have to update the REGEX accordingly.
In your view: <%= clean_links sanitize(#something) %>
So, first the content is sanitized, then you add the rel="nofollow" tag before displaying the link.

Actually there's a built-in way:
sanitize "your input", scrubber: Loofah::Scrubbers::NoFollow.new

Related

Are there any better alternatives to Sanitize for a Ruby app?

I love Sanitize. It's an amazing utility. The only issue I have w/ it is the fact that it takes forever to prepare a development environment w/ it because it uses Nokogiri, which is a pain for compile time. Are there any programs that do what Sanitize does (if nothing else than mildly what it does) w/out using Nokogiri? This would help exponentially!
Rails has its own SanitizeHelper.
According to http://api.rubyonrails.org/classes/ActionView/Helpers/SanitizeHelper.html, it will
This sanitize helper will html encode all tags and strip all attributes that aren’t specifically allowed.
It also strips href/src tags with invalid protocols, like javascript: especially. It does its best to counter any tricks that hackers may use, like throwing in unicode/ascii/hex values to get past the javascript: filters. Check out the extensive test suite.
You can use it in a view like so
<%= sanitize #article.body %>
You can visit the link to see more customizing options like:
Custom Use (only the mentioned tags and attributes are allowed, nothing else)
<%= sanitize #article.body, tags: %w(table tr td), attributes: %w(id class style) %>

The PHP HTMLPurifier library, but for Rails?

Anyone who's done anything much with PHP and receiving rich-text input from something like TinyMCE has (probably) used something like HTMLPurifier to keep the nasties out of the HTML you're intentionally allowing the user to submit.
For example, HTMLPurifier will take a string of (potentially malformed) HTML and strip out disallowed elements and attributes, try to fix broken HTML, and in some cases convert things like <i> to <em>.
Does anything equivalent exist for Rails (3)? What's the generally accepted way to sanitize input from rich text editors in Rails so that you can output the unescaped HTML onto a web page and know that stuff like <style> and <script> tags have been taken out of it and it's not going to break your page (or steal your cookies!)?
EDIT | Anybody used Sanitize? Any other options with pro's & con's?
You can use the sanitize method.
sanitize(html)
There is also a Sanitize gem.
Sanitize.clean(html)
I tend to prefer the Sanitize gem because it can be used as a before_save filter in your models instead of having to use the sanitize method in each of your views.

Markdown to text/plain and text/html for multipart email

I’m looking for a solution to send DRY multipart emails in Rails. With DRY I mean that the content for the mail is only defined once.
I’ve thought about some possible solutions but haven’t found any existing implementations.
The solutions I’ve thought about are:
load the text from I18n and apply Markdown for the html mail and apply Markdown with a special output type for the text mail where
links are put in parenthesis after the link text
bold, italic and other formatting that doesn't make sense are removed
ordered and unordered lists are maintained
generate only the html mail and convert that to text according to the above conditions
Is there any available solution out there? Which one is probably the better way to do it?
In Chapter 4 of Crafting Rails Applications, Jóse Valim walks you through how to make a "merb" handler that uses markdown with interspersed erb and can compile to text and html. Then you make a mailer generator that generates a single merb template for each of your mail actions.
You can read an excerpt from that chapter on the page I linked you to. I highly recommend buying the book.
If you're interested in using my sorry version of what he describes in that book, you can slap this in your Gemfile:
gem 'handlers', :git => "git://github.com/chadoh/handlers.git"
Be warned that I barely know what I'm doing, that I'm not versioning that gem, and that I probably won't really even maintain it. Frankly, I wish I could find someone else who was doing a better job, but I've been unsuccessful in doing so. If you want to fork my project and be the person doing that better job, go for it!
This is a PITA, but is the only way to DRY mail such that you can support both HTML (multipart) & plaintext:
Put the html email copy in a partial file in your ActionMailer view directory with the following extension: _action.html.erb
Replace "action" with whatever action name you are using.
Then create 2 more files in the same directory:
action.text.html.erb and
action.text.plain.erb
In the text.html partial:
<%= render "action.html", :locals => {:html => true} %>
In the text.plain partial:
<% content = render "action.html", :locals => {:html => false} %>
<%= strip_tags(content) %>
That works for me, though it certainly makes me want to pay the monthly service for madmimi
Use the maildown gem.
This gems does the heavy lifting of allowing you to use email.md.erb instead of email.html.erb and email.text.erb. Write it once in a sane format and have it automatically display in HTML and in Plain Text. Win.
There are some intricacies here that you'll want to look at based on your use-case, but here's some of what we did to get it working well:
Create a maildown.rb initializer to setup some sane defaults:
Maildown.allow_indentation = true # Prevents code blocks from forming when using indentiation in markdown emails.
Maildown::MarkdownEngine.set_text do |text|
text.gsub( /{:.*}\n?/, "" ) # Removes Kramdown annotations that apply classes, etc. with `{: .class }`.
This allows you to use indents in your blocks, etc. But also precludes the ability to add indents in your Plain Text. It also removes Kramdown-specific annotation from Plain Text.
Then just replace your HTML and Plain Text files with a single .md.erb file and test it out to make sure it looks good in both versions.
Note, until you remove the .html.erb and .text.erb files, it will show those first before looking for a .md.erb file. This may actually be a nice feature if you ever needed to write separate formats for a specific email (maybe a marketing one that requires more complex formatting than Markdown can provide) without having to specify anything anywhere.
Works a treat.

Escaping HTML in Rails

What is the recommended way to escape HTML to prevent XSS vulnerabilities in Rails apps?
Should you allow the user to put any text into the database but escape it when displaying it? Should you add before_save filters to escape the input?
There are three basic approaches to this problem.
use h() in your views. The downside here is that if you forget, you get pwnd.
Use a plugin that escapes content when it is saved. My plugin xss_terminate does this. Then you don't have to use h() in your views (mostly). There are others that work on the controller level. The downsides here are (a) if there's a bug in the escaping code, you could get XSS in your database; and (b) There are corner cases where you'll still want to use h().
Use a plugin that escapes content when it is displayed. CrossSiteSniper is probably the best known of these. This aliases your attributes so that when you call foo.name it escapes the content. There's a way around it if you need the content unescaped. I like this plugin but I'm not wild about letting XSS into my database in the first place...
Then there are some hybrid approaches.
There's no reason why you can't use xss_terminate and CrossSiteSniper at the same time.
There's also a ERb implementation called Erubis that can be configured so that any call like <%= foo.name %> is escaped -- the equivalent of <%= h(foo.name) %>. Unfortunately, Erubis always seems to lag behind Rails and so using it can slow you down.
If you want to read more, I wrote a blog post (which Xavor kindly linked to) about using xss_terminate.
The h is an alias for html_escape, which is a utility method for escaping all HTML tag characters:
html_escape('<script src=http://ha.ckers.org/xss.js></script>')
# => <script src=http://ha.ckers.org/xss.js></script>
If you need more control, go with the sanitize method, which can be used as a white-list of tags and attributes to allow:
sanitize(#article.body, :tags => %w(table tr td), :attributes => %w(id class style))
I would allow the user to input anything, store it as-is in the database, and escape when displaying it. That way you don't lose any information entered. You can always tweak the escaping logic later...
Use the h method in your view template. Say you have a post object with a comment property:
<div class="comment">
<%= h post.comment %>
</div>
Or with this plugin - no need for h 8)
http://railspikes.com/2008/1/28/auto-escaping-html-with-rails
I've just released a plugin called ActsAsSanitiled using the Sanitize gem which can guarantee well-formedness as well being very configurable to what kind of HTML is allowed, all without munging user input or requiring anything to be remembered at the template level.

Why shouldn't Helpers have html in them?

I have heard that it's best not to actually have any html in your helpers; my question is, Why not? And furthermore, if you were trying to generate an html list or something like that, how can I avoid actual tags?
Thanks!
-fREW
My advice - if it's small pieces of HTML (a couple of tags) don't worry about it. More than that - think about partials (as pulling strings of html together in a helper is a pain that's what the views are good at).
I regularly include HTML in my helpers (either directly or through calls to Rails methods like link_to). My world has not come crashing down around me. In fact I'd to so far as to say my code is very clean, maintainable and understandable because of it.
Only last night I wrote a link_to_user helper to spits out html with normal link to the user along with the user's icon next to it. I could have done it in a partial, but I think link_to_user is a much cleaner way to handle it.
I don't see that there's anything wrong with it. The majority of the rails helpers generate HTML code (which is their purpose) - to me this implies that's what you're supposed to do yourself.
There is however the ever-present issue of code readability. If you have a helper which just builds a big string of raw HTML, then it's going to be hard to understand. While it's fine to generate HTML in helpers, you should do it using things like content_tag, and render :partial rather than just return %Q(<a href="#{something}">#{text}>)
This isn't a full answer to your question, but you can create html in your tags via the content_tag method. My guess as to why would be cleanliness of code.
Also, content_tag allows you to nest tags in blocks. Check out this blog post on content_tag.
On Rails 3 you can use *html_safe* String method to make your helper methods return html tags that won't be escaped.
As mentioned before, helpers are generally thought to be used as business logic, for doing something that drives view code, but is not view code itself. The most conventional place to put things that generate snippets of view code is a partial. Partials can call a helper if needed, but for the sake of keeping things separated, it's best to keep business in the helper and view in the partial.
Also, bear in mind this is all convention, not hard and fast rules. If there's a good reason to break the convention, do what works best.
I put html into partials usually.
Think about semantics. If you put html in a string, you lose the semantic aspect of it: it becomes a string instead of markup. Very different. For example, you cannot validate a string, but you can validate markup.
The reason I wanna put html in a helper instead of partial (and how I found this thread) is terseness. I would like to be able to write =hr instead of =render 'hr'.
To answer the question I didn't ask ;-) : to un-escape HTML in a helper, try this
def hr
raw '<hr />'
end

Resources