Are unnecessary slashes in a URL bad? - url

I noticed that https://stackoverflow.com//////////questions/4659504/ is a valid URL. However https://www.google.com//////////analytics/settings is not. Are there differences inherent in web server technologies that explain this? Should a url with unnecessary slashes be interpreted correctly or should it return an error?

First of all, adding a slash changes the semantics of a URL path like any other character does. So by definition /foo/bar and /foo//bar are not equivalent just as /foo/bar and /foo/bar/ are not equivalent.
But since the URL path is mostly used to be directly mapped onto the file system, web servers often remove empty path segments (Apache does that) so that /foo//bar and /foo/bar are handled equivalently. But this is not the expected behavior; it’s rather done for error correction.

They are both valid URLs.
However, Google's server can't handle the second one.
There is no specific reason to either handle or reject URLs with duplicate slashes; you should spend more time on more important things.

What do you consider "interpreted correctly"? HTTP only really specifices how the stuff in front of the slash after the server name gets interpreted. The rest is entirely up to the web server. It parses what you give it after that point (in whatever manner it likes) and presents you with whatever HTML it feels like providing for that text.

There is a difference in how every application processes requests. If you setup your app to replace succeeding slashes before routing the request you shouldn't have any problems.

Related

A forward slash from https:// is being removed when sending a /oauth/authorize request to a Rails app from a chrome extension?

I am making a launchWebAuthFlow authorization code request from a Chrome extension to a Rails app hosted on Heroku. Doorkeeper is an OAuth wrapper for Rails, and that is what is processing my request. More specifically Doorkeeper::AuthorizationsController#new is processing the request as HTML (why HTML?).
The forward slash (/) is missing from both the URL encoded redirect_uri and the redirect_uri shown in the rails params. The url is correct on the chrome extension side of things (unless the launchWebAuthFlow built in function is doing something to it), so I think something is happening on the server.
It works in development so I don't think anything is wrong on the extension. The app is hosted on Heroku.
Any idea of what could be going wrong here?
Based from this link, Apache denies all URLs with %2F in the path part, for security reasons: scripts can't normally (ie. without rewriting) tell the difference between %2F and / due to the PATH_INFO environment variable being automatically URL-decoded.
You can turn this feature off using the AllowEncodedSlashes directive, but note that other web servers will still disallow it (with no option to turn that off), and that other characters may also be taboo (eg. %5C), and that %00 in particular will always be blocked by both Apache and IIS. So if your application relied on being able to have %2F or other characters in a path part you'd be limiting your compatibility/deployment options.
You should use rawurlencode(), not urlencode() for escaping path parts. urlencode() is misnamed, it is actually for application/x-www-form-urlencoded data such as in the query string or the body of a POST request, and not for other parts of the URL.
The difference is that + doesn't mean space in path parts. rawurlencode() will correctly produce %20 instead, which will work both in form-encoded data and other parts of the URL.
Hope this helps!

performance rewrite rules vs routes.maproute

I'm helping a client with a web application upgrade, this includes a task that needs to route 100's of outdated bookmarks to new urls.
In reviewing the following links it seems clear cut that I should be updating the routing table and not putting in rewrite rules in web.config to deal with the outdated bookmarks:
When to use routes vs. rewrite rules?
http://www.iis.net/learn/extensions/url-rewrite-module/iis-url-rewriting-and-aspnet-routing
From a curiosity standpoint, it would be a material performance hit to have 100 - 250 rewrite rules in web.config as oppose to entries within routes.maproute that directly handles the mapping? Right?
Either way, all of the rules will need to be executed before any of the actual routes are hit. So, the amount of performance that is used for either approach would be similar.
I suspect that the IIS rewrite module will be slightly faster because it happens before .NET even becomes involved in the request. However, the actual performance will depend on whether you use partial URL matches (fastest) vs case-sensitive complete URL matches (fast) vs case-insensitive complete URL matches (not-so-fast) vs using regular expressions (slow). Note that not all of these options are available in IIS rewrite.
Also, from a maintenance standpoint it makes much more sense to use IIS rewrite than mapping routes for obsolete URLs. Then you can keep these old URLs out of your application's configuration.
The only exception is if you want to handle the user edge cases where the browser doesn't respect an HTTP 301, and you want to make a user-friendly redirect page that ensures the user will know about the updated URL and update their bookmarks. The IIS rewrite module just sends a 301 response and assumes that the client will respect it (which isn't always the case).

Nginx - What precautions need to be taken when I turn underscores_in_headers on?

I'm writing a rails application and passing in a custom access token through the HTTP headers. To accommodate this I need to turn on underscores_in_header in nginx.conf for my code to run. (See Rails Not able to access headers after moving to Digital Ocean)
Because this option is by default off, I assume there are some security risks I assume by turning it on. However, I have been unable to find an explanation for what these risks or concerns are. What are these risks and how do I account for them within my code?
Thanks!
According to the Nginx Pitfalls...
This is done in order to prevent ambiguities when mapping headers to CGI variables, as both dashes and underscores are mapped to underscores during that process.
So it looks like a question of avoiding collisions between variable names. FWIW, the applicable RFC 7230, sec 3.2.6 specifically allows underscores and RFC 3875, sec. 4.1.18 states that:
The HTTP header field name is converted to upper case, has all occurrences of "-" replaced with "_" and has "HTTP_" prepended to give the meta-variable name.
The security problem, then, is related to this conversion process of "-" to "_" and how receiving applications then access the User-Agent variable. For instance, "User-Agent" would be mapped to "User_Agent" by the server, and then in PHP (for example) the CGI environment var is accessed as:
$_SERVER['HTTP_USER_AGENT']
In rails:
request.env['HTTP_USER_AGENT']
So what happens if the client sends "User_Agent" instead of "User-Agent?" The underscore would be left in place and then "HTTP_USER_AGENT" will have been explicitly set by the a client script (normally, it's set by the browser). The following post from 2007 discusses the potential to exploit this process:
Exploiting reflected XSS vulnerabilities, where user input must come through HTTP Request Headers
That post suggests there is a problem if the server app "insecurely prints" the header value (to the client browser) and in the example it would presumably execute a javascript alert popup. It's just an example though.
The question is, does the problem still exist? Well, yes. See the following post that discusses the Shellshock vulnerability where the same idea is used to exploit the BASH shell:
Inside Shellshock: How hackers are using it to exploit systems
Therefore, if you intend to parse any header with an older version of BASH, you need to be aware of the vulnerability presented by Shellshock. At the end of the day, you should always take care to sanitize any data value that has been sent to your application outside of your control.

Is there a script or other method for obtaining the correct variation of a URL for a web page?

I'm assuming there is a single correct variation of a URL for every page. Please correct me if I'm wrong.
Given an input of an equivalent URL, I need to get the correction of a URL. For example, most browsers accept slight variations from the exact URL but then correct it to take you to the right page? (Or perhaps this is done at the DNS level?)
The task I'm working on is getting the correct MD5 hash of a URL that will be recognized by an API service that returns information about a URL. For example, if I hash 'http://stackoverflow.com', I get an empty response. In order to get a valid response I need to hash 'https://stackoverflow.com/', (with a trailing slash).
EDIT: The API service I'm using is the Delicious API. In case that resonates with anyone's experience.
I'm assuming there is a single correct variation of a URL for every page. Please correct me if I'm wrong.
There is only a single "correct" one if the author decides that there should be, then they will likely use a combination of canonical and HTTP redirects to push people in that direction.
For example, most browsers accept slight variations from the exact URL but then correct it to take you to the right page?
Host names are case insensitive, and the root doesn't need a slash (so http://example.com and http://EXAMPLE.cOM/ are identical).
Beyone that, the rest of the URL (except for a fragment identifier if there is one) is handled entirely by the HTTP server. It might treat it case sensitive, it might not. It might require things in a certain order, it might not.

How do you see the client-side URL in ColdFusion?

Let's say, on a ColdFusion site, that the user has navigated to
http://www.example.com/sub1/
The server-side code typically used to tell you what URL the user is at, looks like:
http://#cgi.server_name##cgi.script_name#?#cgi.query_string#
however, "cgi.script_name" automatically includes the default cfm file for that folder- eg, that code, when parsed and expanded, is going to show us "http://www.example.com/sub1/index.cfm"
So, whether the user is visiting sub1/index.cfm or sub1/, the "cgi.script_name" var is going to include that "index.cfm".
The question is, how does one figure out which URL the user actually visited? This question is mostly for SEO-purposes- It's often preferable to 301 redirect "/index.cfm" to "/" to make sure there's only one URL for any piece of content- Since this is mostly for the benefit of spiders, javascript isn't an appropriate solution in this case. Also, assume one does not have access to isapi_rewrite or mod_rewrite- The question is how to achieve this within ColdFusion, specifically.
I suppose this won't be possible.
If the client requests "GET /", it will be translated by the web server to "GET /{whatever-default-file-exists-fist}" before ColdFusion even gets invoked. (This is necessary for the web server to know that ColdFusion has to be invoked in the first place!)
From ColdFusion's (or any application server's) perspective, the client requested "GET /index.cfm", and that's what you see in #CGI#.
As you've pointed out yourself, it would be possible to make a distinction by using a URL-rewriting tool. Since you specifically excluded that path, I can only say that you're out of luck here.
Not sure that it is possible using CF only, but you can make the trick using webserver's URL rewriting -- if you're using them, of course.
For Apache it can look this way. Say, we're using following mod_rewrite rule:
RewriteRule ^page/([0-9]+)/?$
index.cfm?page=$1&noindex=yes [L]
Now when we're trying to access URL http://website.com/page/10/ CGI shows:
QUERY_STRING page=10&noindex=yes
See the idea? Think same thing is possible when using IIS.
Hope this helps.
I do not think this is possible in CF. From my understanding, the webserver (Apache, IIS, etc) determines what default page to show, and requests it from CF. Therefore, CF does not know what the actual called page is.
Sergii is right that you could use URL rewrting to do this. If that is not available to you, you could use the fact that a specific page is given precedence in the list of default pages.
Let's assume that default.htm is the first page in the list of default pages. Write a generic default.htm that automatically forwards to index.cfm (or whatever). If you can adjust the list of defaults, you can have CF do a 301 redirect. If not, you can do a meta-refresh, or JS redirect, or somesuch in an HTML file.
I think this is possible.
Using GetHttpRequestData you will have access to all the HTTP headers.
Then the GET header in that should tell you what file the browser is requesting.
Try
<cfdump var="#GetHttpRequestData()#">
to see exactly what you have available to use.
Note - I don't have Coldfusion to hand to verify this.
Edit: Having done some more research it appears that GetHttpRequestData doesn't include the GET header. So this method probably won't work.
I am sure there is a way however - try dumping the CGI scope and see what you have.
If you are able to install ISAPI_rewrite (Assuming you're on IIS) - http://www.helicontech.com/isapi_rewrite/
It will insert a variable x-rewrite-url into the GetHttpRequestData() result structure which will either have / or /index.cfm depending on which URL was visited.
Martin

Resources