Rails sanitizing user input - ruby-on-rails

For user inputed data I am taking the approach of sanitizing it before saving to strip out any html or anything malicious (i.e. tags).
I have a before_validation callback:
before_validation :sanitize_fields
def sanitize_fields
full_sanitizer = Rails::Html::FullSanitizer.new
white_list = Rails::Html::WhiteListSanitizer.new
# Only text allowed
self.fname = full_sanitizer.sanitize(self.fname)
self.lname = full_sanitizer.sanitize(self.lname)
self.company = full_sanitizer.sanitize(self.company)
# Some HTML Allowed
self.description = white_list.sanitize(self.description)
end
The problem I am encountering is that when saving something like "Smith & Company" as the name it is stored in the DB as Smith & Company. Not an issue per se, but then it also displays as Smith & Company in the edit view of the form, which seems funny and confusing to the end user.
Is there a better way than the approach I am taking? This "smells" wrong to me.
Thanks!

If you are confident the data is sanitized, you can declare it html_safe in the views to avoid it showing up as &; it will render exactly as provided.
This of course begs the question: rather than jump through hoops to pre-sanitize and then tell the view that it has been sanitized, why not just allow the view to sanitize strings like it does by default? If you render the string "<tag>some_stuff</tag>" in a view, it will escape it for you. Are you concerned about the unsanitized string appearing elsewhere other than in a view that you control?

The reason it smells wrong is because it is.
With the possible exception of pe-rendering large text (markdown, etc) blocks into html, I would avoid sanitizing your model data this way. Following rails best practices will protect you from SQL injection, text output in views will be rendered in a safe way by default.
If you need to allow users to input html, sanitize it on output (in your view) not on input.
Separation of concerns is one reason, but the biggest is that what you are trying to do is simply not idiomatic rails. If you chose to continue down that path you will be fighting the framework constantly.

Related

Rails SQL Injection: How vulnerable is this code?

I'm trying to understand SQL Injection. It seems like people can get pretty creative. Which gets me wondering about my search-based rails webapp I'm making.
Suppose I just fed user-entered information directly into the "where" statement of my SQL query. How much damage could be done to my database by allowing this?
def self.search(search)
if search
includes(:hobbies, :addresses).where(search)
else
self.all
end
So basically, whatever the user types into the search bar on the home page gets fed straight into that 'where' statement.
An example of a valid 'search' would be:
"hobby LIKE ? OR (gender LIKE ? AND hobby LIKE ?)", "golf", "male", "polo"
Does the fact that it's limited to the context of a 'where' statement provide any sort of defense? Could they still somehow perform delete or create operations?
EDIT:
When I look at this tutorial, I don't see a straightforward way to perform a deletion or creation action out of the where clause. If my database contains no information that I'm not willing to display from a valid search result, and there's no such thing as user accounts or admin privileges, what's really the danger here?
I took this from another post here: Best way to go about sanitizing user input in rails
TL;DR
Regarding user input and queries: Make sure to always use the active record query methods (such as .where), and avoid passing parameters using string interpolation; pass them as hash parameter values, or as parameterized statements.
Regarding rendering potentially unsafe user-generated html / javascript content: As of Rails 3, html/javascript text is automatically properly escaped so that it appears as plain text on the page, rather than interpreted as html/javascript, so you don't need to explicitly sanitize (or use <%= h(potentially_unsafe_user_generated_content)%>
If I understand you correctly, you don't need to worry about sanitizing data in this manner, as long as you use the active record query methods correctly. For example:
Lets say our parameter map looks like this, as a result of a malicious user inputting the following string into the user_name field:
:user_name => "(select user_name from users limit 1)"
The bad way (don't do this):
Users.where("user_name = #{params[:id}") # string interpolation is bad here
The resulting query would look like:
SELECT users.* FROM users WHERE (user_name = (select user_name from users limit 1))
Direct string interpolation in this manner will place the literal contents of the parameter value with key :user_name into the query without sanitization. As you probably know, the malicious user's input is treated as plain 'ol SQL, and the danger is pretty clear.
The good way (Do this):
Users.where(id: params[:id]) # hash parameters
OR
Users.where("id = ?", params[:id]) # parameterized statement
The resulting query would look like:
SELECT users.* FROM users WHERE user_name = '(select user_name from users limit 1)'
So as you can see, Rails in fact sanitizes it for you, so long as you pass the parameter in as a hash, or method parameter (depending on which query method you're using).
The case for sanitization of data on creating new model records doesn't really apply, as the new or create methods are expecting a hash of values. Even if you attempt to inject unsafe SQL code into the hash, the values of the hash are treated as plain strings, for example:
User.create(:user_name=>"bobby tables); drop table users;")
Results in the query:
INSERT INTO users (user_name) VALUES ('bobby tables); drop table users;')
So, same situation as above.
I hope that helps. Let me know if I've missed or misunderstood anything.
Edit Regarding escaping html and javascript, the short version is that ERB "escapes" your string content for you so that it is treated as plain text. You can have it treated like html if you really want, by doing your_string_content.html_safe.
However, simply doing something like <%= your_string_content %> is perfectly safe. The content is treated as a string on the page. In fact, if you examine the DOM using Chrome Developer Tools or Firebug, you should in fact see quotes around that string.

Complex search screens in Rails 3

I need to implement some search functionality within a Rails application. Most of the stuff I have found is generally aimed at simple plain-text search. I am trying to implement something much more specific. The sort of functionality I am looking to create is this (from a C application):
http://andyc.ac/query.gif
The form just submits the data entered by the user. So I need to translate strings like "3..7" into SQL conditions for the where method e.g.
TestLine.where( "test_int >= ? and test_int <= ?", MinInt, MaxInt )
It seems like this is something that already exists somewhere. The exact format expected is not too important, as the users are not shared between the Rails and C applications. How would this be done?
FWIW the specific functionality you describe is actually supported directly. Well.. almost. From the docs:
A range may be used in the hash to use the SQL BETWEEN operator:
Student.where(:grade => 9..12)
Of course then it's a matter of translating the user's string input to a Range, which isn't very complex, e.g.:
def str_to_range str
str =~ /(\d+)\.\.(\d+)/
Range.new *$~.captures.map(&:to_i)
end
It would probably make the most sense in a scope on your model. (Of course a shortcut would be to simply eval '9..12' but evaling input from the end user is a really, really bad idea.)
Give a look at thinking sphinx(http://freelancing-god.github.com/ts/en/). It might make your task a lot easier. You can search in that:
http://freelancing-god.github.com/ts/en/searching.html#basic

Which characters in a search query does Google ignore (versus treating them as spaces?)

I want to give my pages human-readable slugs, but Rails' built-in parameterize method isn't SEO-optimized. For example, if I have a post called "Notorious B.I.G. is the best", parameterize will give me this path:
/posts/notorious-b-i-g-is-the-best
which is suboptimal since Google construes the query "Notorious B.I.G." as "Notorious BIG" instead of "Notorious B I G" (i.e., the dots are removed rather than treated as spaces)
Likewise, "Tom's fave pizza" is converted to "tom-s-fave-pizza", when it should be "toms-fave-pizza" (since Google ignores apostrophe's as well)
To create a better parameterize, I need to know which characters Google removes from queries (so I can remove them from my URLs) and which characters Google treats as spaces (so I can convert them to dashes in my URLs).
Better still, does such a parameterize method exist?
(Besides stringex, which I think tries to be too clever. 2 representative problem cases:
[Dev]> "Notorious B.I.G. is the best".to_url
=> "notorious-b-dot-i-g-is-the-best"
[Dev]> "No, Curren$y is the best".to_url
=> "no-curren$y-is-the-best"
I would try using a gem that has been designed for generating slugs. They often make good design decisions and they have a way of updating the code for changing best practices. This document represents Google's best practices on URL design.
Here is a list of the best gems for solving this problem. They are sorted by rank which is computed based on development activity and how many people "watch" changes to the gems source code.
The top one right now is frendly_id and it looks like it will generate good slugs for your use in SEO. Here is a link to the features of the gem. You can also configure it and it looks like it is perfect for your needs.
Google appears to have good results for both the "b-i-g" and "big" in the url slugs.
For the rails side of things, yes a parameterize method exists.
"Notorious B.I.G. is the best".parameterize
=> "notorious-b-i-g-is-the-best"
I think you can create the URLs yourself... something like
class Album
before_create :set_permalink
def set_permalink
self.permalink = name.parameterize
end
def to_params
"#{id}-#{permalink}"
end
end
This will create a url structure of:
/albums/3453-notorious-b-i-g-is-the-best
You can remove the id section in to_params if you want to.
Use the title tag and description meta tag to tell google what the page is called: these carry more weight than the url. So, leave your url as /posts/notorious-b-i-g-is-the-best but put "Notorious B.I.G. is the best" in your title tag.

Rails way to offer modified attributes

The case is simple: I have markdown in my database, and want it parsed on output(*).
#post.body is mapped to the posts.body column in the database. Simple, default Activerecord ORM. That column stores the markdown text a user inserts.
Now, I see four ways to offer the markdown rendered version to my views:
First, in app/models/post.rb:
# ...
def body
markdown = RDiscount.new(body)
markdown.to_html
end
Allowing me to simply call #post.body and get an already rendered version. I do see lots of potential problems with that, e.g. on edit the textfield being pre-filled with the rendered HMTL instead of the markdown code.
Second option would be a new attribute in the form of a method
In app/models/post.rb:
# ...
def body_mardownified
markdown = RDiscount.new(body)
markdown.to_html
end
Seems cleanest to me.
Or, third in a helper in app/helpers/application_helper.rb
def markdownify(string)
markdown = RDiscount.new(string)
markdown.to_html
end
Which is used in the view, instead of <%= body %>, <%= mardownify(body) %>.
The fourth way, would be to parse this in the PostsController.
def index
#posts = Post.find(:all)
#posts.each do |p|
p.body = RDiscount.new(string).to_html
#rendered_posts << p
end
end
I am not too familiar with Rails 3 proper method and attribute architecture. How should I go with this? Is there a fifth option? Should I be aware of gotchas, pitfalls or performance issues with one or another of these options?
(*) In future, potentially updated with a database caching layer, or even special columns for rendered versions. But that is beyond the point, merely pointing out, so to avoid discussion on filter-on-output versus filter-on-input :).
The first option you've described won't work as-is. It will cause an infinite loop because when you call RDiscount.new(body) it will use the body method you've just defined to pass into RDiscount (which in turn will call itself again, and again, and so on). If you want to do it this way, you'd need to use RDiscount.new(read_attribute('body')) instead.
Apart from this fact, I think the first option would be confusing for someone new looking at your app as it would not be instantly clear when they see in your view #post.body that this is in fact a modified version of the body.
Personally, I'd go for the second or third options. If you're going to provide it from the model, having a method which describes what it's doing to the body will make it very obvious to anyone else what is going on. If the html version of body will only ever be used in views or mailers (which would be logical), I'd argue that it makes more sense to have the logic in a helper as it seems like the more logical place to have a method that outputs html.
Do not put it in the controller as in your fourth idea, it's really not the right place for it.
Yet another way would be extending the String class with a to_markdown method. This has the benefit of working on any string anywhere in your application
class String
def to_markdown
RDiscount.new(self)
end
end
#post.body.to_markdown
normal bold italic
If you were using HAML, for example in app/views/posts/show.html.haml
:markdown
= #post.body
http://haml-lang.com/docs/yardoc/file.HAML_REFERENCE.html#markdown-filter
How about a reader for body that accepts a parse_with parameter?
def body(parse_with=nil)
b = read_attribute('body')
case parse_with
when :markdown then RDiscount.new(b)
when :escape then CGI.escape(b)
else b
end
end
This way, a regular call to body will function as it used to, and you can pass a parameter to specify what to render with:
#post.body
normal **bold** *italic*
#post.body(:markdown)
normal bold italic

Remove all html tags from attributes in rails

I have a Project model and it has some text attributes, one is summary. I have some projects that have html tags in the summary and I want to convert that to plain text. I have this method that has a regex that will remove all html tags.
def strip_html_comments_on_data
self.attributes.each{|key,value| value.to_s.gsub!(/(<[^>]+>| |\r|\n)/,"")}
end
I also have a before_save filter
before_save :strip_html_comments_on_data
The problem is that the html tags are still there after saving the project. What am I missing?
And, is there a really easy way to have that method called in all the models?
Thanks,
Nicolás Hock Isaza
untested
include ActionView::Helpers::SanitizeHelper
def foo
sanitized_output = sanitize(html_input)
end
where html_input is a string containing HTML tags.
EDIT
You can strip all tags by passing :tags=>[] as an option:
plain_text = sanitize(html_input, :tags=>[])
Although reading the docs I see there is a better method:
plain_text = strip_tags(html_input)
Then make it into a before filter per smotchkiss and you're good to go.
It would be better not to include view helpers in your model. Just use:
HTML::FullSanitizer.new.sanitize(text)
Just use the strip_tags() text helper as mentioned by zetetic
First, the issue here is that Array#each returns the input array regardless of the block contents. A couple people just went over Array#each with me in a question I asked: "Return hash with modified values in Ruby".
Second, Aside from Array#each not really doing what you want it to here, I don't think you should be doing this anyway. Why would you need to run this method over ALL the model's attributes?
Finally, why not keep the HTML input from the users and just use the standard h() helper when outputting it?
# this will output as plain text
<%=h string_with_html %>
This is useful because you can view the database and see the unmodified data exactly as it was entered by the user (if needed). If you really must convert to plain text before saving the value, #zetetic's solution gets you started.
include ActionView::Helpers::SanitizeHelper
class Comment < ActiveRecord::Base
before_save :sanitize_html
protected
def sanitize_html
self.text = sanitize(text)
end
end
Reference Rails' sanitizer directly without using includes.
def text
ActionView::Base.full_sanitizer.sanitize(html).html_safe
end
NOTE: I appended .html_safe to make HTML entities like render correctly. Don't use this if there is a potential for malicious JavaScript injection.
If you want to remove along with html tags, nokogiri can be used
include ActionView::Helpers::SanitizeHelper
def foo
sanitized_output = strip_tags(html_input)
Nokogiri::HTML.fragment(sanitized_output)
end

Resources