How do I use sanitize to add links in plaintext? - ruby-on-rails

So I want to be able to add links in the body of a post (and not show it as plaintext). However, I do not want to allow any other HTML tags. Right now I have:
sanitize #post.body, tags: %w(a), attributes: %w(href)
but this does not seem to work.
I've also tried
simple_format(#post.body).gsub(URI.regexp, '\0').html_safe
but that allows other HTML tags, which I do not want.
Any ideas how to fix this? Thanks!

Ruby/Rails will not just identify a link in a string because it has http or www somwhere in it. Assuming you are getting the body of #post via a form, you need to wrap the input in some kind of WYSIWYG editor such as tinymce. Then, if the WYSIWYG editor saves the serialized input:
click to see this link about google.ca
to the database, you can whitelist the <a> tag and href attribute so it actually generates a link

Ended up using the auto_link gem: auto_link(#comment.body)

Related

rails 5.x: add nofollow to all links in 'sanitize'

I am working on a Rails application whose HAML templates frequently make use of a routine called sanitize. I have deduced from context that this routine sanitizes user-controlled HTML. Example:
# views/feed_items/_about.html.haml
%h3 Summary:
.description
= sanitize #feed_item.description
I want to make this routine add 'rel=nofollow' to all outbound links, in addition to what it's already doing. What is the most straightforward way to do that?
N.B. I am not having any luck finding the definition of this method, or the official configuration knobs for it. The vendor directory has two different HTML sanitizer gems in it and I can't even figure out which one is being used. This is a large, complicated web application that I did not write, and I barely understand Ruby, let alone all of Rails' extensions to it. Please assume I do not know any of the things that you think are obvious.
The sanitizer will strip out the rel tags if they exist.
I ran into a similar issue and added an additional helper method - clean_links to the ApplicationHelper module, and called it after sanitizing the content.
# application_helper.rb
def clean_links html
html.gsub!(/\\2')
html.html_safe
end
This method looks for all <a> tags, and adds rel="nofollow". The html_safe method is necessary or else the HTML will be displayed as a string (it's already been sanitized).
This solution treats all links equally, so if you only want this for links pointing outside the domain, you'll have to update the REGEX accordingly.
In your view: <%= clean_links sanitize(#something) %>
So, first the content is sanitized, then you add the rel="nofollow" tag before displaying the link.
Actually there's a built-in way:
sanitize "your input", scrubber: Loofah::Scrubbers::NoFollow.new

Rails 4: Disable automatic CSS sanitizing when using built-in HTML sanitize

I'm building an application that has a HTML GUI interface to create, move and edit boxes (div) inside a container div. These boxes get assigned inline styles when editing, these inline styles are saved to the database and are output in the views:
<%= sanitize raw(#slide.content) %>
I want to sanitize the HTML itself, to avoid someone hacking in, for instance, a script tag, through sending that by editing what's sent to the server when the boxes are saved.
Rails 4 has a helper method sanitize available through the class ActionView::Helpers::SanitizeHelper. When I use this with a test content value that contains a malicious <script> tag, the script gets removed just fine. But sanitizing the content also strips CSS properties inside the style tag that are necessary for the boxes, like top, left, position, etc.
In the linked documentation, it's stated that sanitize will automatically use the function sanitize_css when it comes across a style attribute:
sanitize_css(style)
Sanitizes a block of CSS code. Used by sanitize when it comes across a style attribute.
I do not want this behaviour of sanitize. How can I disable sanitize using sanitize_css, to sanitize the HTML, but not the CSS?
You can allow any attributes and tags you need, so rails will skip them.
sanitize raw(#slide.content), tags: %w(table tr td ul li), attributes: %w(style href title)
Speaking about CSS rules themselves, it's a bit harder to allow additional rules, but still possible. You can monkey patch the HTML::WhiteListSanitizer class (https://github.com/rails/rails/blob/c71c8a962353642ee44b5cc6ed68dc18322eea72/actionpack/lib/action_view/vendor/html-scanner/html/sanitizer.rb). There are several attributes that can help.
In your config/application.rb file:
config.action_view.sanitized_allowed_tags = nil
config.action_view.sanitized_allowed_attributes = nil
safe lists found here: loofah html5 safelist

The PHP HTMLPurifier library, but for Rails?

Anyone who's done anything much with PHP and receiving rich-text input from something like TinyMCE has (probably) used something like HTMLPurifier to keep the nasties out of the HTML you're intentionally allowing the user to submit.
For example, HTMLPurifier will take a string of (potentially malformed) HTML and strip out disallowed elements and attributes, try to fix broken HTML, and in some cases convert things like <i> to <em>.
Does anything equivalent exist for Rails (3)? What's the generally accepted way to sanitize input from rich text editors in Rails so that you can output the unescaped HTML onto a web page and know that stuff like <style> and <script> tags have been taken out of it and it's not going to break your page (or steal your cookies!)?
EDIT | Anybody used Sanitize? Any other options with pro's & con's?
You can use the sanitize method.
sanitize(html)
There is also a Sanitize gem.
Sanitize.clean(html)
I tend to prefer the Sanitize gem because it can be used as a before_save filter in your models instead of having to use the sanitize method in each of your views.

Generating a link with Markdown (BlueCloth) that opens in a new window

I'd like to have a link generated with BlueCloth that opens in a new window. All I could find was the ordinary [Google](http://www.google.com/) syntax but nothing with a new window.
Ideas?
Regards
Tom
Here is a complete reference for markdown: http://daringfireball.net/projects/markdown/syntax
And since there is no mention of how to set the target attribute, I would believe it is not directly possible, but the reference also says:
For any markup that is not covered by
Markdown’s syntax, you simply use HTML
itself. There’s no need to preface it
or delimit it to indicate that you’re
switching from Markdown to HTML; you
just use the tags.
Source: http://daringfireball.net/projects/markdown/syntax#html
So I would suggest you have to use the html syntax for links like this
update
if you wrap the markdown generated content in a div with a specific id like this:
and you use jQuery, you can add the following javascript:
$('#some_id a').attr('target','_blank');
Or you can save the BlueCloth output in a variable before outputting.
markdown_generated_string.gsub!(/<a\s+/i,'<a target="_blank" ')

how to show contents which include html tag?

I am using FckEditor in Create.aspx page in asp.net mvc application.
Since I need to show rich text in web pages, I used ValidateInput(false) attribute top of action method in controller class.
And I used Html.Encode(Model.Message) in Details.aspx to protect user's attack.
But, I had result what I did not want as following :
<p> Hello </p>
I wanted following result not above :
Hello
How can I show the text what user input?
Thanks in advance
The short answer is that HTMLEncode is making your markup show like that. If you don't HTMLEncode, it will do what you want.
You need to think about whether or not you need full control of markup, who is entering the markup, and if an alternative like BBCode is an option.
If your users using the editor are all sure to be 'safe' users, then XSS isn't likely to be as much a concern. However, if you are using this on a comment field, then BBCode, or something like SO itself uses is more appropriate.
You wont be able to use a WYSIWYG editor and do HTMLEncode though... (without BBCode, or some other token system)
It seems the user entered "<p> Hello </p>" (due to pressing Enter?) into the edit control, and it is displaying correct in the HTML as you have done an Html.Encode. E.g. the paragrahs are not rendered, they are outputted as "<p>..</p>" as the string is HTML encoded into something like "<p> Hello <p>".
If you do not want tags, I would suggest searching the text string for tags (things with <...>) and removing them from the inputted text. Do this before HTML.Encode.
...or am I missing something?
You can use HttpServerUtility.HtmlEncode(String)

Resources