In Rails, Is it possible to prevent HTTP requests that come from the Browser's address bar ? And only allow navigation through links made within the app ?
I really looking for preventing a user to simply type his destination in the browser and only uses the links provided.
I know It maybe sounds silly. But I'm kind of trying to give a different UX than any regular website.
Is this approach possible? And If yes, How?
And what is the possible disadvantages or deficits that may cause?
For completeness, Yes.
The previous responder is right, this sounds like a bad idea, but it is possible. I imagine it similar to how authentication work. Set a secret value in the session on the first page, ask for it it on the second page the user reaches, if they don't match user didn't use your navigation. Refresh as quickly as needed (every page, for example).
Drawbacks? It's weird, that's not how webpages work. A clicked link (or GET request) is not different than a URL typed in the browser. What do you mean by "different UX than any regular website", the more details we have the easier we can help.
No*.
You can make it harder for a user to guess the correct URL by using obfuscation or use sessions to make the application stateful. But technically a GET request sent by clicking a link or by typing the url in the browser are identical to the server. The former is a form of security by obscurity.
The whole basically violates the core tenants of what a RESTful application does. In REST a resource should be omnipotent - requesting the same resource should provide the same response no matter how the user got there.
If you find that an action should not be able to be performed by typing the address into the browser you are most likely using the HTTP verbs wrong (using GET where you should be using POST, PUT or DELETE) or have a poor authorization system.
Related
I am working with a Domestic Violence support organisation to build a website and have been asked to provide a "Quick Exit" function.
The purpose is to enable the user to exit the site quickly without closing the browser. I have seen such buttons on similar sites and the normal scenario is that they simply cause a Google search page to be shown. (easy but doesn't hide history)
I am looking for ideas to improve on this function to hide/disguise the history stored in the browser as this is currently a fairly significant flaw with the Quick Exit buttons I've seen to date.
I had a concept but I am looking for input on either fleshing out my concept, or other alternative directions to consider.
My concept was to have two domains: let's call them dv-site.com and decoy-site.com. The former being the source of domestic violence support information and the latter being some random content, could be anything, lets just say weather information for the sake of the conversation.
If a user navigates directly to dv-site.com the server redirects to decoy-site.com but also attaches some session specific, or perhaps single use query string or similar.
decoy-site.com validates the query string and, if valid, loads dv-site.com within an iframe or something like that so from the users perspective they are just looking at dv-site.com, though the domain recorded in history is decoy-site.com.
Links within the iframe loaded site would similarly be redirected with the same or a new query string.
If a user was to click on the browser history and go directly to decoy-site.com it would not be able to validate the query string and would just load the decoy site like a normal site. i.e. just showing weather information that exist on that site.
Domestic violence is a serious systemic issue and I would love some input from anyone who has more technical knowledge than I do on fleshing out this concept.
Other aspects I am unsure of how to tackle;
ensuring that dv-site.com can get crawled and ranked by search engines, even though users are all redirected, as it is imperative that it appears in search results so it can be found
technical aspects of a redirect that does not appear in history.
I'm unsure if it's possible to do this without all content and engagement being attributed to the decoy-site..
For the redirect, I believe that HTTP redirects do not get stored in history. You can use a 302 redirect for that. HTTP has a set-cookie header that lets you record a cookie - coupled with the headers here, you can give the decoy site access without recording it in history. Then, delete the cookie.
As far as pagerank goes, you could add a line to robots.txt as described here (the last point) to force the bot to scrape using a query parameter. Then in the backend, return the dv site only if that parameter is passed, otherwise redirect. If the googlebot removes query params when publishing, it will work out. Otherwise, it might fail.
Best of luck.
For one of my project's (weird) requirements, I want to use cookie less sessions. At the same time, "session.use_trans_sid" can not be turned on :(
Does anybody please let me know if is there any other way out ??
Thanks
Manish
Make a custom session manager that identifies the user based on, for example, IP address and user agent and other identifying factors (as IP+UA might not and probably will not be unique). Another (ugly) solution is to just implement the use_trans_sid functionality yourself by adding a session identifier GET parameter to every link by hand (if it's a small site) or with a hidden form (that's non-standard).
If you really want sessions without cookies, you can always put the SID in all your URLs manually. People used to do this quite a bit. :-)
The only other option is to keep the session data on the client and pass it back and forth to and from the server with each request, although technically that would be a sessionless architecture.
That means that for GETs each link has to be rewritten to include all the session variables, and for POSTs they have to be included as hidden fields.
I don't know much about SEO and how web spiders work, so forgive my ignorance here. I'm creating a site (using ASP.NET-MVC) which has areas that displays information retrieved from the database. The data is unique to the user, so there's no real server-side output caching going on. However, since the data can contain things the user may not wish to have displayed from search engine results, I'd like to prevent any spiders from accessing the search results page. Are there any special actions I should take to ensure that the search result directory isn't crawled? Also, would a spider even crawl a page that's dynamically generated and would any actions preventing certain directories being search mess up my search engine rankings?
edit: I should add, I'm reading up on robots.txt protocol, but it relies on co-operation from the web crawler. However, I'd also like to prevent any data-mining users who will ignore the robots.txt file.
I appreciate any help!
You can prevent some malicious clients from hitting your server too heavily by implementing throttling on the server. "Sorry, your IP has made too many requests to this server in the past few minutes. Please try again later." In practice, though, assume that you can't stop a truly malicious user from bypassing any throttling mechanisms that you put in place.
Given that, here's the more important question:
Are you comfortable with the information that you're making available for all the world to see? Are your users comfortable with this?
If the answer to those questions is no, then you should be ensuring that only authorized users are able to see the sensitive information. If the information isn't particularly sensitive but you don't want clients crawling it, throttling is probably a good alternative. Is it even likely that you're going to be crawled anyway? If not, robots.txt should be just fine.
It seems like you have 2 issues.
Firstly a concern about certain data appearing in search results. The second about malicious or unscrupulous user harvesting user related data.
The first issue will be covered by appropriate use of a robots.txt file as all the big search engines honour this.
The second issue seems more to do with data privacy. The first question which immediately springs to mind is: If there is user information which people may not want displayed, why are you making it available at all?
What is the privacy policy for such data?
Do users have the ability to control what information is made available?
If the information is potentially sensitive but important to the system could it be restricted so it is only available to logged in users?
Check out the Robots exclusion standard. It's a text file that you put on your site that tells a bot what it can and can't index. You will also want to address what happens if a bot doesn't honour the robots.txt file.
robots.txt file as mentioned. If that is not enough then you can:
Block unknown useragents - hard to maintain, easy for a bot to forge a browser's (although most legitimate bots wont)
Block unknown IP addresses - not useful for a public site
Require logins
Throttle user connections - tricky to tune, you will still be disclosing information.
Perhaps by using a combination. Either way it is a trade off, if the public can browse to it, so can a bot. Be sure you don't block & alienate people in your attempts to block bots.
a few options:
force the user to login to view the content
add a CAPTCHA page before the content
embed content in Flash
load dynamically with JavaScript
I have been working through Microsoft's ASP.NET MVC tutorials, ending up at this page
http://www.asp.net/learn/mvc/tutorial-32-cs.aspx
The following statement is made towards the bottom of this page:
In general, you don’t want to perform an HTTP GET operation when invoking an action that modifies the state of your web application. When performing a delete, you want to perform an HTTP POST, or better yet, an HTTP DELETE operation.
Is this true? Can anyone offer a more detailed explanation for the rationale behind this statement?
Edit
Wikipedia states the following:
Some methods (for example, HEAD, GET, OPTIONS and TRACE) are defined as safe, which means they are intended only for information retrieval and should not change the state of the server.
By contrast, methods such as POST, PUT and DELETE are intended for actions which may cause side effects either on the server
Jon Skeet's answer is the canonical answer. But: Suppose you have a link:
href = "\myApp\DeleteImportantData.aspx?UserID=27"
and the google-bot comes along and indexes your page? What happens then?
GET is conventionally free of side-effects - in other words, it doesn't change the state. That means the results can be cached, bookmarks can be made safely etc.
From the HTTP 1.1 RFC 2616
Implementors should be aware that the
software represents the user in their
interactions over the Internet, and
should be careful to allow the user to
be aware of any actions they might
take which may have an unexpected
significance to themselves or others.
In particular, the convention has been
established that the GET and HEAD
methods SHOULD NOT have the
significance of taking an action other
than retrieval. These methods ought to
be considered "safe". This allows user
agents to represent other methods,
such as POST, PUT and DELETE, in a
special way, so that the user is made
aware of the fact that a possibly
unsafe action is being requested.
Naturally, it is not possible to
ensure that the server does not
generate side-effects as a result of
performing a GET request; in fact,
some dynamic resources consider that a
feature. The important distinction
here is that the user did not request
the side-effects, so therefore cannot
be held accountable for them.
Apart from purist issues around being idempotent, there is a practical side: spiders/bots/crawlers etc will follow hyperlinks. If you have your "delete" action as a hyperlink that does a GET, then google can merrily delete all your data. See "The Spider of Doom".
With posts, this isn't a risk.
Another example..
http://example.com/admin/articles/delete/2
This will delete the article if you are logged in and have the right privileges. If your site accepts comments for example and a user submits that link as an image; like so:
<img src="http://example.com/admin/articles/delete/2" alt="This will delete your article."/>
Then when you yourself as the admin user come to browse through the comments on your site the browser will attempt to fetch that image by sending off a request to that URL. But because you are logged in whilst the browser is doing this the article will get deleted.
You may not even notice, without looking at the source code as most browsers wont show anything if it can't find an image.
Hope that makes sense.
Please see my answer here. It applies equally to this question.
Prefetch: A lot of web browsers will use prefetching. Which means
that it will load a page before you
click on the link. Anticipating that
you will click on that link later.
Bots: There are several bots that scan and index the internet for
information. They will only issue GET
requests. You don't want to delete
something from a GET request for this
reason.
Caching: GET HTTP requests are not supposed to change state and they should be idempotent. Idempotent means that
issuing a request once, or issuing it
multiple times gives the same result.
I.e. there are no side effects. For
this reason GET HTTP requests are
tightly tied to caching.
HTTP standard says so: The HTTP standard says what each HTTP method is
for. Several programs are built to
use the HTTP standard, and they assume
that you will use it the way you are
supposed to. So you will have
undefined behavior from a slew of
random programs if you don't follow.
In addition to spiders and requests having to be idempotent there's also a security issue with get requests. Someone can easily send your users an e-mail with
<img src="http://yoursite/Delete/Me" />
in the text and the browser will happily go along and try and access the resource. Using POST isn't a cure for such things (as you can put together a form post in javascript pretty easily) but it's a good start.
About this topic (HTTP methods usage), I recommend reading this blog post: http://blog.codevader.com/2008/11/02/why-learning-http-does-matter/
This is actually the opposite problem: why do not use POST when no data is changed.
Apart from all the excellent reasons mentioned on here, GET requests could be logged by the recipient server, such as in the access.log. If you send across sensitive data such as passwords in the request, they'll get logged as plaintext.
Even if they are hashed/salted for secure DB storage, a breach (or someone looking over the IT guy's shoulder) could reveal them. Such data should go in the POST body.
Let's say we have an internet banking application and we visit the transfer page. The logged in user chooses to transfer $10 to another account.
Clicking on the submit button redirects (as a GET request) to https://my.bank.com/users/transfer?amount=10&destination=23lk3j2kj31lk2j3k2j
But the internet connection is slow and/or the server(s) is(are) busy so after hitting the submit button the new page is loading slow.
The user gets frustrated and starts hitting F5 (refresh page) furiously. Guess what will happen? More than one transfer will occur possibly emptying the user's account.
Now if the request is made as POST (or anything else than GET) the first F5 (refresh page) the user will make the browser will gently ask "are you sure you want to do that? It can have side effects [ bla bla bla ] ... "
Another issue with GET is that the command goes to the browser's address bar. So if you refresh the page, you issue the command again, be it "delete last stuff", "submit the order" or similar.
In ASP.NET MVC it seems to be common practice not to use GET requests for calls to a controller that modify the model. For example, deleting a customer should not be possible by clicking a simple HTML link.
The only reason for this rule I am aware of is not safeguard against web-crawlers which might indavertently alter the database. GET requests are commonly regarded as safe, whereas POST requests are not.
Does this mean that this rule does not apply to non-public portions of a website (Example: Your password-protected user administration area)? Or is there any other reason not to use destructive GET requests?
This is generally part of HTTP. From the HTTP 1.1 RFC 2616
Implementors should be aware that the
software represents the user in their
interactions over the Internet, and
should be careful to allow the user to
be aware of any actions they might
take which may have an unexpected
significance to themselves or others.
In particular, the convention has been
established that the GET and HEAD
methods SHOULD NOT have the
significance of taking an action other
than retrieval. These methods ought to
be considered "safe". This allows user
agents to represent other methods,
such as POST, PUT and DELETE, in a
special way, so that the user is made
aware of the fact that a possibly
unsafe action is being requested.
Naturally, it is not possible to
ensure that the server does not
generate side-effects as a result of
performing a GET request; in fact,
some dynamic resources consider that a
feature. The important distinction
here is that the user did not request
the side-effects, so therefore cannot
be held accountable for them.
In other words, it's not enforced, but it's really bad form for a GET request to have side-effects. Imagine if a user bookmarks a URL which does updates something, for example - they probably wouldn't expect that to happen.
Another good reason is accelerator plug-ins for browsers. These attempt to speed up page loads by pre-fetching links on the current page. Imagine if you had a bunch of GET requests to delete all the objects in a list, the plug-in would delete them!
The short of it is that you can't predict what a browser will do with GET requests, if it looks like a plain-old hyperlink then its fair game for a browser to go fetch it.
Yes.
It's not just about web crawlers, it's about CRSF - Cross Site Request Forgery.
So imagine that someone is logged into your web site, and browses to www.hax0rs.com
In the source for hax0rs.com is the following tag
<img src="http://mysite.com/members/statusChange?status=I%20am%20looking%20for%20a%20gimp%20mask" height="0" width="0">
Because your user is logged in, and because the request is going to your site, the authentication cookie goes with it. And bang, suddenly your user's status has changed.
What fun :)
But I suppose you can still do some sort of "non-retrieval" actions on GET requests. For example updating the "LastVisit" records which can be consider undestructive and relatively safe.