How best to sanitize rich html with rails? - ruby-on-rails

I'm looking for advice on how to clean submitted html in a web app so it can be redisplayed in future with out styles or unclosed tags wrecking the layout of an app.
On my app rich HTML is submitted by users with YUI Rich text editor, which by default runs a few regexps to clean the input, and I'm also calling the [filter_MSWord][1] to catch any crap sent in from office
On the back end, I'm running ruby-tidy to to sanitize the html before being displayed as comments, but on occasion badly pasted html still affect the layout of the app I'm using - how can I safeguard against this?
FWIW here are the sanitizer settings I'm using -
module HTMLSanitizer
def tidy_html(input)
cleaned_html = Tidy.open(:show_warnings=>false) do |tidy|
# don’t output body and html tags
tidy.options.show_body_only = true
# output xhtml
tidy.options.output_html = true
# don’t write newlines all over the place
tidy.options.wrap = 0
# use utf8 to play nice with rails
tidy.options.char_encoding = 'utf8'
xml = tidy.clean(input)
xml
end
end
end
What else are my options here?

I personally use the sanitize gem.
require 'sanitize'
op = Sanitize.clean("<html><body>wow!</body></hhhh>") # Notice the incorrect HTML. It still outputs "wow!"

I use the sanitize helper available from ActionView
Module ActionView::Helpers::SanitizeHelper

Related

rails 5.x: add nofollow to all links in 'sanitize'

I am working on a Rails application whose HAML templates frequently make use of a routine called sanitize. I have deduced from context that this routine sanitizes user-controlled HTML. Example:
# views/feed_items/_about.html.haml
%h3 Summary:
.description
= sanitize #feed_item.description
I want to make this routine add 'rel=nofollow' to all outbound links, in addition to what it's already doing. What is the most straightforward way to do that?
N.B. I am not having any luck finding the definition of this method, or the official configuration knobs for it. The vendor directory has two different HTML sanitizer gems in it and I can't even figure out which one is being used. This is a large, complicated web application that I did not write, and I barely understand Ruby, let alone all of Rails' extensions to it. Please assume I do not know any of the things that you think are obvious.
The sanitizer will strip out the rel tags if they exist.
I ran into a similar issue and added an additional helper method - clean_links to the ApplicationHelper module, and called it after sanitizing the content.
# application_helper.rb
def clean_links html
html.gsub!(/\\2')
html.html_safe
end
This method looks for all <a> tags, and adds rel="nofollow". The html_safe method is necessary or else the HTML will be displayed as a string (it's already been sanitized).
This solution treats all links equally, so if you only want this for links pointing outside the domain, you'll have to update the REGEX accordingly.
In your view: <%= clean_links sanitize(#something) %>
So, first the content is sanitized, then you add the rel="nofollow" tag before displaying the link.
Actually there's a built-in way:
sanitize "your input", scrubber: Loofah::Scrubbers::NoFollow.new

Pass multiline html string(markdown text) from rails to javascript

I have a markdown text saved in the databse and I want to show it as html to the user. I am using markdown.js as the processor and I pass the big multiline html string from rails to javascript by rendering a js.erb file from the controller.
But since it is multiline, the javascript becomes invalid. Is there any rails function which will take the whole string and assign it as a single line string to javascript variable. I cannot use html_safe also as some things might be escaped. What is the best way to handle markdown?
sample markdown
![enter image description here](https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcRiOb7-0qeyx73XuXNqzLpxgXTlf5UMrMnF5zm-UKn3wLaXCW0UUw "enter image title here")
Hello
If you render erb server-side anyway, you will probably be better rendering Markdown server-side as well. You can use Redcarpet for that.
Add gem redcarpet to your Gemfile.
Run bundle install
Use it:
text = "my _markdown_ *variable*"
markdown = Redcarpet::Markdown.new(Redcarpet::Render::HTML)
markdown.render(text)
It will be a good idea to save rendered HTML in the database, to save CPU time on re-rendering the same text every time you want to show it to client. So you can add something like this to your model:
class Article
# let's say that model has 'source' attributes with Markdown
# and we want to put resulting HTML into 'html' attribute
before_save :markdown
def markdown
self.html = Redcarpet::Markdown.new(Redcarpet::Render::HTML).render(source)
end
end

Rails 3.1 CKEditor gem raw text

I'm using the ckeditor gem and when I use the paste as plain text dialog it encapsulates the text in quotes as well as < p> tags and < br> tags. Is there any way I can tell ckeditor not to use any markup whatsoever when using that option.
What I am trying to accomplish is to have snippets of code within an article. Those snippets are processed using the markdown gem as well as pigments.rb. The following code is what I'm trying to accomplish
```ruby
puts "hello world"
class Hello
end
```
and this is what I'm getting
<p>
```ruby<br />
puts "hello world"
class hello<br />
end<br /></p>
This might be what you're looking for:
http://ckeditor.com/addon/codemirror
I really don't know if the ckeditor has that option.
If your problem is show the information like WYSIWYG in a web with RoR maybe you have to set in the view that the information is html safe (This is for security reasons and is set by default). If you don't do that you will see with the html tags.
You have many ways to do the html safe:
Here is a discussion about which one use:
raw vs. html_safe vs. h to unescape html
Hope that solve your problem.
Codemirror might be your best bet for this. Its like CKeditor but for code.
http://codemirror.net/
You could even write your own mode for it, which if I understand what your trying to do might end up being required.
Have not found anything better than going with Markdown. Just like it is done here on SO

Disable XSS and HTML Sanitization in Rails 3

I'm having an issue where when I have the contents of my rich text editor saved into the database using activerecord the html content is stripped of the html contents (I think it fires html_safe on it). I tried overriding the html_safe method on the content string, but nothing works.
content = "<p>hello</p>"
#article.content = content
puts #article.content # "<p>hello</p>"
#article.save
puts #article.content # "<>hello</>"
How can you override the html stripping capabilities in activerecord for a particular column?
As frank blizzard already said in his answer, you make your self vulnerable two XSS-Attacks.
But if you trust your authors, that this columns are safe two display, you can do something like this in your Article model
class Article < ActiveRecord::Base
def content
attributes[:content].html_safe
end
end
You can use the raw(string) method, but it would make you vunlerable against XSS attacks.
Another option would be taking a deeper look into markdown.
Turns out the issue to this problem was nothing todo with Rails or the XSS stripping. The code that I had was modifying a string and then saving the results elsewhere which was causing the original input to be changed. I solved the problem by using string.dup to copy over the original string so that I wasn't affected.
There should be an option for this.
I encourage you to take a look at the docs of the rich text editor that you are using.

The PHP HTMLPurifier library, but for Rails?

Anyone who's done anything much with PHP and receiving rich-text input from something like TinyMCE has (probably) used something like HTMLPurifier to keep the nasties out of the HTML you're intentionally allowing the user to submit.
For example, HTMLPurifier will take a string of (potentially malformed) HTML and strip out disallowed elements and attributes, try to fix broken HTML, and in some cases convert things like <i> to <em>.
Does anything equivalent exist for Rails (3)? What's the generally accepted way to sanitize input from rich text editors in Rails so that you can output the unescaped HTML onto a web page and know that stuff like <style> and <script> tags have been taken out of it and it's not going to break your page (or steal your cookies!)?
EDIT | Anybody used Sanitize? Any other options with pro's & con's?
You can use the sanitize method.
sanitize(html)
There is also a Sanitize gem.
Sanitize.clean(html)
I tend to prefer the Sanitize gem because it can be used as a before_save filter in your models instead of having to use the sanitize method in each of your views.

Resources