I feel like I've seen this in many Rails apps, and it never made sense to me (but maybe that's because it's well past midnight and my brain is mush).
When I edit (for example) a user at /admin/users/20/edit, and I get a validation error, and the controller code is something like this:
def update
if #user.update(user_params)
redirect_to(some_path, notice: 'Record updated')
else
render("edit") # <<<<<<<<<<<<<<<<<<<
end
end
instead of going to /admin/users/20/edit, it shows /admin/users/20 in the browser.
This seems all well and good, but without /edit it is not a valid GET URI, so other code which is consuming the HTTP_REFERER, and which (naturally) expects it to be a valid GET URI will take the user to an error page.
In my case there is an internal gem which handles impersonation of users by admin users. Ending an impersonation takes the admin user back to the referrer, and if they have had the referer modified by a validation error, then they get an error.
I could
modify the gem to handle this case (a hassle, but perhaps necessary)
add a route to make this URL valid for editing without /edit (seems like it shouldn't be necessary, and seems a bit kludgey),
but I want to know if there is a reason this is happening. Is this in fact standard Rails behavior or have I overlooked something? If this is standard, is there a good widely-accepted fix? If it is not standard Rails behavior, where should I look for the culprit?
It's pretty normal behaviour because when you update a user you will do a PUTor PATCH to /admin/users/20. So if there is a validation error, you are rendering the edit template and the url stays the same (/admin/users/20)
You could do a redirect instead of render, but in that case you are losing some info about the validation error. Or you should send it a long with the redirect.
This is a common Rails beginner hangup and the key here is really understanding HTTP verbs, the concept of idempotency and Rails flavored REST.
The /new and /edit actions in Rails respond to GET requests. They are idempotent - the page will look the same to any visitor and if you reload the page you'll get the exact same page. They only serve to display a form in classical applications.
Updating resources done with PATCH /users/1 (or PUT in legacy Rails apps). These verbs are non-idempotent as they modify a resource. Unlike many other frameworks Rails uses the HTTP method to destinguish between different actions instead of using for example POST /users/1/update or POST /users/1/delete.
What you're doing when you call render("edit") is not redirecting the user back to the form. You're rendering a view and displaying the result of performing a non-idempotent action. This is not something that can be linked to as the result depends on the input passed in the request body and neither can you reload the page without resending the exact same request - and in this case sending the result again is not guarenteed to give the same result. Some browsers do not allow this at all and almost all will warn you.
This seems all well and good, but without /edit it is not a valid GET URI, so other code which is consuming the HTTP_REFERER, and which (naturally) expects it to be a valid GET URI will take the user to an error page.
This is an X & Y problem. The result of editing a record is not idempotent and thus cannot be linked to. Using HTTP_REFERER is in itself also problematic as its not guarenteed to be sent by the client.
While you can create a scheme to redirect back with the user input stuffed into the query string or the session this is the wrong answer to the wrong question.
where should I look for the culprit?
Whatever gem you're using might not be a good solution for the original problem - or even good at all. Impersonating a user might be a lot more fuss then just creating a separate endpoint for admins to edit users directly.
It certainly sounds like a very brittle solution.
Related
I was reading this stackoverflow comment which describes how rails authenticity tokens work: Understanding the Rails Authenticity Token
and the highest rated response begins with this:
"When the user views a form to create, update, or destroy a resource, the Rails app creates a random authenticity_token, stores this token in the session, and places it in a hidden field in the form. When the user submits the form, Rails looks for the authenticity_token, compares it to the one stored in the session, and if they match the request is allowed to continue."
This makes sense to me as an abstract concept but I wanted to understand how this works in a concrete sense, I wanted to see exactly how this happens so I am crystal clear about how this works, so I created a new rails app and scaffolded a User with just one field name then I dropped a binding.pry in Users#create right up top inside the action, which should happen directly after the user submits the form. The pry session began right after I added a new user name, hit submit and it moves to create...So I inspected the source of my application in my web browser to find that the csrf-token content value in my rails generated csrf meta tags do not match the hidden authenticity token value within my form and neither one matches the value I get if, during the same pry session, I examine the session.to_hash property and inspect the "_csrf_token" value.
I then tried setting up a pry in the Users#new action and the Users#create action and noted the "_csrf_token" value on the session and compared it to the values of the form fields and meta tags once I quit and my app moved to the pry in the Users#create action but nothing matched. It seems like none of these three values match at all.
Yet protect_from_forgery with: :exception is set in my application controller and from what I read in the top rated response I was expecting to see matching values...somewhere. It seems like nothing matches. So I currently have no concept of what rails is matching what to in order to allow for the form to proceed and the data to be saved.
The author of the top rated response also says that an authenticity_token is stored on the session but all I see is a _csrf_token (are these the same?) and a session_id and, as I said, they don't match anything. I see no match whatsoever.
If rails is matching something to the value in my form field, it doesn't seem to be the value of the '_csrf_token' unless its converting it to something else behind the scenes and then matching that value to the value in the hidden form field or something. I don't feel like I understand what is going on.
Rails is built on top of the HTTP protocol.
HTTP is stateless, which means that each request has to be treated as unique (all the supporting data has to be built each time).
To do this, Rails has the session, which is a series of "cookies" stored on the browser's system:
HTTP is a stateless protocol. Sessions make it stateful.
These sessions are used to keep small snippets of data which are used by Rails to rebuild your user's "environment" with each request. Devise stores user_id in the session, and Rails keeps a ton of other data in there, too.
--
So as opposed to - for example - a game, where you have a continual flow of data via a stateful protocol such as TCP (once the connection is established, it stays established), Rails basically reconnects with each new request.
Part of this process is form submission.
To prevent problems arising from this (IE bots/spammers sending millions of requests), Rails uses CSRF to "match" the authenticity token in the user's session with the one displayed on the web page.
This basically says that the user submitting the form is the one who originally made the request, not some bot which got the from through a proxy or some shady software.
You end up with the following:
I find a common issue in my RESTful Rails apps controllers that respond to multiple formats (HTML, XML, etc). The issue is that, for any given method (INDEX, CREATE, NEW, EDIT, SHOW, UPDATE, or DESTROY) I want to restrict access to admin users for 1 format, but not others. Of course I already have a "before_filter :admin_required" for this, but it is useless unless all formats for a given method adhere to the same permissions (which, many times, is not the case). I end up just having to open up the entire method and then add a "head :bad_request unless current_user.is_admin" to any of the formats that need protecting. This works, but for some reason feels wrong to me. It seems like I should be able to add a format parameter on the before_filter somehow, so as to keep things tidy. How do you guys do it and why?
UPDATED QUESTION:
I think people are not fully understanding my situation, so let me try to re-explain. First of all, just know that this already works for me and is secure and I have no problems with it. So basically, I have decided that HTML pages will only be for admins to create/update/edit/delete objects. The normal users will ONLY interact with the app via XML thru a flash interface. What this means is that there are essentially 2 different paths of execution (each with their own distinct code/logic etc.) for each action. So when the request comes in, the format dictates which path is taken. There are checks in each to make sure that no malicious requests are allowed, and a head :bad_request is returned in these cases. There is no way to "craft an XML request outside of flash" and somehow make the app do something that it otherwise shouldn't. The app could care less if the XML request came from Flash or not. It does not matter one bit. The only thing that matters is whether or not the request is valid based on the credentials of the user and attributes posted - not where it came from. Anyways, this all works great, the only downside is that a lot of my actions that would normally just have a "before_filter :admin_required" can't use that anymore. They need to be opened up to everyone essentially, and then I have to manually do a "head :bad_request unless current_user.is_admin" on certain action/format combination's that require it. I was just hoping that I could have more fine-grained control over the filters in the controllers so that I could do something like "before_filter :admin_required, :format => html"
I'm not sure if fully understand you, but you can access the format parameter in your before_filter, eg:
before_filter :admin_required
...
private
def admin_required
return nil unless params[:format]
case params[:format].to_sym
when :xml
head :bad_request unless current_user.is_admin
end
end
As #jdl mentioned, it sounds like it might be a security hole to do this. Usually you would render the same object(s), just with different formats, which means an attacker would just look at a different format to see all the information.
I can see where it might make sense if you want everybody to see the html, but only admins can see xml, but then users could just screen scrape the page to get equivalent information.
In your case, an attacker could use network traces to watch the xml requests that the flash application sends, and then craft their own requests based off that. I think you would have a hard time determining if the request really came from flash, or if it had been spoofed. This may not be a problem for you, because of course your html pages are always protected, but I think you would have to assume that anybody can see the XML versions.
I think I foound a better way to handle this. First, get Rails 3. Then, you can restrict by format on the routes. Couple this with namespaces, and you can achieve what I was trying to do in a cleaner way.
I am new to RESTful architecture or at least new to using it properly I have only had true experience with SOAP. I am having a problem wrapping my head around some things. I know there are other questions that are similar but none, that I have found, answer my question satisfactorily.
I am just starting this app so I want to get it started the right way and what I am looking at now is a user registration screen. I have two validation calls that occur before the registration form is even submitted. First I have a validation call that checks to make sure the email entered by the user is unique and second I have a validation call that checks to make sure an access code we provide to the customer exists in the database.
I currently have it structured as a POST (which I believe should be a GET) and I have an action argument that defines what I am wanting to do. So for the email I have an argument string such as
action=validateemail&value=email#email.com
and it is calling the User action of my MembershipController. I am entirely sure this is wrong as I should only be using the verbs GET, POST, PUT, and DELETE yet I am defining my own verb using the action argument.
Honestly, I don't know how to do this. I believe the User should be my resource but possibly for the email validation Email should be my resource. I guess what I am asking is how would you do what I am trying to do? I know some of you might just say do all the validation upon the submit, but I would prefer to do it both ways really. I would like the asynchronous validation as well as the validation I will perform when the user submits.
We do something similar and our resource is called "Account". For the validation I would do a GET for the Account specified and validate the HTTP return code. I would expect a 404 - Not Found to let me know the proposed account doesn't exist. If they passed in mangled data a 400 - Bad Request would tell you something was wrong. To create the Account a POST of the same resource would do. To do something like change a password, a PUT might be appropriate. I think that if you already are making a trip to the server, you might as well return the account(200 - Ok on the GET) if it exists to save yourself the second trip.
I have been working through Microsoft's ASP.NET MVC tutorials, ending up at this page
http://www.asp.net/learn/mvc/tutorial-32-cs.aspx
The following statement is made towards the bottom of this page:
In general, you don’t want to perform an HTTP GET operation when invoking an action that modifies the state of your web application. When performing a delete, you want to perform an HTTP POST, or better yet, an HTTP DELETE operation.
Is this true? Can anyone offer a more detailed explanation for the rationale behind this statement?
Edit
Wikipedia states the following:
Some methods (for example, HEAD, GET, OPTIONS and TRACE) are defined as safe, which means they are intended only for information retrieval and should not change the state of the server.
By contrast, methods such as POST, PUT and DELETE are intended for actions which may cause side effects either on the server
Jon Skeet's answer is the canonical answer. But: Suppose you have a link:
href = "\myApp\DeleteImportantData.aspx?UserID=27"
and the google-bot comes along and indexes your page? What happens then?
GET is conventionally free of side-effects - in other words, it doesn't change the state. That means the results can be cached, bookmarks can be made safely etc.
From the HTTP 1.1 RFC 2616
Implementors should be aware that the
software represents the user in their
interactions over the Internet, and
should be careful to allow the user to
be aware of any actions they might
take which may have an unexpected
significance to themselves or others.
In particular, the convention has been
established that the GET and HEAD
methods SHOULD NOT have the
significance of taking an action other
than retrieval. These methods ought to
be considered "safe". This allows user
agents to represent other methods,
such as POST, PUT and DELETE, in a
special way, so that the user is made
aware of the fact that a possibly
unsafe action is being requested.
Naturally, it is not possible to
ensure that the server does not
generate side-effects as a result of
performing a GET request; in fact,
some dynamic resources consider that a
feature. The important distinction
here is that the user did not request
the side-effects, so therefore cannot
be held accountable for them.
Apart from purist issues around being idempotent, there is a practical side: spiders/bots/crawlers etc will follow hyperlinks. If you have your "delete" action as a hyperlink that does a GET, then google can merrily delete all your data. See "The Spider of Doom".
With posts, this isn't a risk.
Another example..
http://example.com/admin/articles/delete/2
This will delete the article if you are logged in and have the right privileges. If your site accepts comments for example and a user submits that link as an image; like so:
<img src="http://example.com/admin/articles/delete/2" alt="This will delete your article."/>
Then when you yourself as the admin user come to browse through the comments on your site the browser will attempt to fetch that image by sending off a request to that URL. But because you are logged in whilst the browser is doing this the article will get deleted.
You may not even notice, without looking at the source code as most browsers wont show anything if it can't find an image.
Hope that makes sense.
Please see my answer here. It applies equally to this question.
Prefetch: A lot of web browsers will use prefetching. Which means
that it will load a page before you
click on the link. Anticipating that
you will click on that link later.
Bots: There are several bots that scan and index the internet for
information. They will only issue GET
requests. You don't want to delete
something from a GET request for this
reason.
Caching: GET HTTP requests are not supposed to change state and they should be idempotent. Idempotent means that
issuing a request once, or issuing it
multiple times gives the same result.
I.e. there are no side effects. For
this reason GET HTTP requests are
tightly tied to caching.
HTTP standard says so: The HTTP standard says what each HTTP method is
for. Several programs are built to
use the HTTP standard, and they assume
that you will use it the way you are
supposed to. So you will have
undefined behavior from a slew of
random programs if you don't follow.
In addition to spiders and requests having to be idempotent there's also a security issue with get requests. Someone can easily send your users an e-mail with
<img src="http://yoursite/Delete/Me" />
in the text and the browser will happily go along and try and access the resource. Using POST isn't a cure for such things (as you can put together a form post in javascript pretty easily) but it's a good start.
About this topic (HTTP methods usage), I recommend reading this blog post: http://blog.codevader.com/2008/11/02/why-learning-http-does-matter/
This is actually the opposite problem: why do not use POST when no data is changed.
Apart from all the excellent reasons mentioned on here, GET requests could be logged by the recipient server, such as in the access.log. If you send across sensitive data such as passwords in the request, they'll get logged as plaintext.
Even if they are hashed/salted for secure DB storage, a breach (or someone looking over the IT guy's shoulder) could reveal them. Such data should go in the POST body.
Let's say we have an internet banking application and we visit the transfer page. The logged in user chooses to transfer $10 to another account.
Clicking on the submit button redirects (as a GET request) to https://my.bank.com/users/transfer?amount=10&destination=23lk3j2kj31lk2j3k2j
But the internet connection is slow and/or the server(s) is(are) busy so after hitting the submit button the new page is loading slow.
The user gets frustrated and starts hitting F5 (refresh page) furiously. Guess what will happen? More than one transfer will occur possibly emptying the user's account.
Now if the request is made as POST (or anything else than GET) the first F5 (refresh page) the user will make the browser will gently ask "are you sure you want to do that? It can have side effects [ bla bla bla ] ... "
Another issue with GET is that the command goes to the browser's address bar. So if you refresh the page, you issue the command again, be it "delete last stuff", "submit the order" or similar.
In ASP.NET MVC it seems to be common practice not to use GET requests for calls to a controller that modify the model. For example, deleting a customer should not be possible by clicking a simple HTML link.
The only reason for this rule I am aware of is not safeguard against web-crawlers which might indavertently alter the database. GET requests are commonly regarded as safe, whereas POST requests are not.
Does this mean that this rule does not apply to non-public portions of a website (Example: Your password-protected user administration area)? Or is there any other reason not to use destructive GET requests?
This is generally part of HTTP. From the HTTP 1.1 RFC 2616
Implementors should be aware that the
software represents the user in their
interactions over the Internet, and
should be careful to allow the user to
be aware of any actions they might
take which may have an unexpected
significance to themselves or others.
In particular, the convention has been
established that the GET and HEAD
methods SHOULD NOT have the
significance of taking an action other
than retrieval. These methods ought to
be considered "safe". This allows user
agents to represent other methods,
such as POST, PUT and DELETE, in a
special way, so that the user is made
aware of the fact that a possibly
unsafe action is being requested.
Naturally, it is not possible to
ensure that the server does not
generate side-effects as a result of
performing a GET request; in fact,
some dynamic resources consider that a
feature. The important distinction
here is that the user did not request
the side-effects, so therefore cannot
be held accountable for them.
In other words, it's not enforced, but it's really bad form for a GET request to have side-effects. Imagine if a user bookmarks a URL which does updates something, for example - they probably wouldn't expect that to happen.
Another good reason is accelerator plug-ins for browsers. These attempt to speed up page loads by pre-fetching links on the current page. Imagine if you had a bunch of GET requests to delete all the objects in a list, the plug-in would delete them!
The short of it is that you can't predict what a browser will do with GET requests, if it looks like a plain-old hyperlink then its fair game for a browser to go fetch it.
Yes.
It's not just about web crawlers, it's about CRSF - Cross Site Request Forgery.
So imagine that someone is logged into your web site, and browses to www.hax0rs.com
In the source for hax0rs.com is the following tag
<img src="http://mysite.com/members/statusChange?status=I%20am%20looking%20for%20a%20gimp%20mask" height="0" width="0">
Because your user is logged in, and because the request is going to your site, the authentication cookie goes with it. And bang, suddenly your user's status has changed.
What fun :)
But I suppose you can still do some sort of "non-retrieval" actions on GET requests. For example updating the "LastVisit" records which can be consider undestructive and relatively safe.