I'm helping a client with a web application upgrade, this includes a task that needs to route 100's of outdated bookmarks to new urls.
In reviewing the following links it seems clear cut that I should be updating the routing table and not putting in rewrite rules in web.config to deal with the outdated bookmarks:
When to use routes vs. rewrite rules?
http://www.iis.net/learn/extensions/url-rewrite-module/iis-url-rewriting-and-aspnet-routing
From a curiosity standpoint, it would be a material performance hit to have 100 - 250 rewrite rules in web.config as oppose to entries within routes.maproute that directly handles the mapping? Right?
Either way, all of the rules will need to be executed before any of the actual routes are hit. So, the amount of performance that is used for either approach would be similar.
I suspect that the IIS rewrite module will be slightly faster because it happens before .NET even becomes involved in the request. However, the actual performance will depend on whether you use partial URL matches (fastest) vs case-sensitive complete URL matches (fast) vs case-insensitive complete URL matches (not-so-fast) vs using regular expressions (slow). Note that not all of these options are available in IIS rewrite.
Also, from a maintenance standpoint it makes much more sense to use IIS rewrite than mapping routes for obsolete URLs. Then you can keep these old URLs out of your application's configuration.
The only exception is if you want to handle the user edge cases where the browser doesn't respect an HTTP 301, and you want to make a user-friendly redirect page that ensures the user will know about the updated URL and update their bookmarks. The IIS rewrite module just sends a 301 response and assumes that the client will respect it (which isn't always the case).
Related
I'm working on a project to require HTTPS everywhere among a suite of MVC and WebAPI applications. I'm trying to understand the trade-offs between clicking the "Require SSL" checkbox in IIS & using a URL Rewrite zmodule vs. using a RequireHttpsAttribute in my global filters and modifying my web.config.
I've found the following guides detailing each approach:
https://webmasters.stackexchange.com/questions/28057/iis-7-require-ssl-automatically-redirect-to-https
http://tech.trailmax.info/2014/02/implemnting-https-everywhere-in-asp-net-mvc-application/
Explain the mechanism can be lengthy, so I will just list the most significant differences in behaviour:
do "Require SSL" in IIS:
The context basically expalin what it do, it's "Require" not "Enforce", which means, if people trying to access your website content through http, the server will just respond with a 403 error, which is usually not a desired behavior, but this may help some certain situation
using URL rewrite module:
The module itself can do quite some different thing, but I assume you are just going to do the regular https redirect. Which means, if user trying to hit ANY content of the site through http, the server will do a 301 or 302 redirect to the https version of same url. This is usually a good option since it doesn't affect any usability of the website.
Global RequireHttpsAttribute action filter: This do similar thing to option number 2, it will do a 302 redirect for any http request that is hitting an ACTION. The main difference is that this only applies to all actions in your controllers, Which means, if someone trying to just get a image or css file through http on your website, this option will let it through and not do any enforcement. This leave you the capability to serve static contents through http, which can be useful in some specific circumstances
Just one extra thing worth mention, the 301 and 302 redirect is not going too well with http POST, so if your user trying to do a post through http, the request body will get lost (thanks to the info from #ChrisPratt).
Typically the folks managing the infrastructure are responsible for making sure things are on https. Typically they aren't very good at this so that is where the RequireHttpsAttribute kicks in as it can encforce https requests at a code level thereby enforcing the HTTPS-only attribute.
In practice it isn't so great as many production setups -- including stackoverflow.com's -- see https terminated in an edge device before being unwrapped and handed to the back-end apps as http and the require https attribute isn't quite nuanced enough to understand this distinction.
The best bet in general is to configure the edge device providing the public http interface to take HTTPS and only HTTPS. Then setup secondary virtual sites [or whatever is vendor appropriate] to redirect all traffic to the cannonical HTTPS url. I'd be a bit nervous about relying upon the RequireHttpsAttribute unless it will be a small app handling it's own requests. That still leaves open holes in terms of artifacts and other things that might not be coming off of a controller.
I have an account with Gearhost.com and when it comes to setting up sub-domains you are currently required to go in and configure an URL Rewrite entry using IIS Remote Admin.
The directory folder structure follows the pattern:
\mastersite
\mastersite\subdomain1
The Gearhost KB Article on how to do it can be found here:
https://support.gearhost.com/KB/a851/setting-a-subdomains-content-location-using-url-rewrite.aspx?KBSearchID=0
This works just fine, but I ran into a scenario that revealed the ability to access the sub-domain by using the master.com/sub-domain path.
subdomain1.site.com (works)
www.site.com/subdomain1 (displays site also --which I don't want)
I don't know if the KB article is the correct way to configure sub-domains in IIS or if I need to manage the routing in my Microsoft MVC 3 Application.
Let's say it is the correct way to setup/configure a sub-domain. Is there a way to restrict the path for the 2nd option, so it returns as page not found or access forbidden or something to this effect?
I'm developing a Microsoft MVC Application and if I use a "Request.Url" call, it actually returns the full path of the 2nd option even when I'm sitting on what looks like a perfect path to the sub-domain home page.
So I don't know if this needs to be handled a different way, if the URL Rewrite entry needs to be changed, or what the solution may be.
Looking for feedback from any engineers who may have more knowledge on the topics.
Thanks.
I ran across an article which solved my original request for help.
It involved creating Outgoing rules in IIS, to rename the path. The rule looks for the path in question and then rewrites it.
Per the article I used Outgoing rule # 2.
Pre-Condition: None
Matching Scope: Server Variable
Variable Name: RESPONSE_LOCATION
Variable Value: Matches the Pattern
Using: Regular Expressions
Pattern: ^(?:MyMasterSiteSubFolder/MySubDomain|(.*//[_a-zA-Z0-9-\.]*)?/MyMasterSiteSubFolder/MySubDomain)(.*)
[x] Ignore case
Action: Rewrite
Action Properties Value: {R:1}{R:2}
[x] Replace existing server variable value
[ ] Stop processing of subsequent rules
I have over 90 urls set out in the following format:
http://www.mysite.com/folder1/folder2/page.html
Each of these URLs will be printed on paper for a user to input into their address bar. The problem at the moment is they are too long and therefore I need make these URLs as short as possible.
However, what would be the best method for doing so?
Would sub folders be the best thing here, such as "keyword.mysite.com"?
I don't want to use a url shortening service as they still need to be related to my domain name. Additional domain names forwarding on to the pages are also out of the question due to the quantity of urls.
Richard
Without knowing what technology you are working with (apache/php, asp.net, JSP, etc) all I can suggest is investigating Url Rewriting. Here is a codeproject example of a rewriter for ASP.Net.
There's a handful of mechanisms that come to mind quickly. One is to host your own url-shortening service for your own domain: http://docs.example.com/xsdf and so forth. Writing one for your own users shouldn't be too much work, especially since you could even write a quick script to submit all the URLs for shortening and replace them all without ever making a pretty interface for a human.
If you want something even cheaper, but more work on the part of your server admins, you could use the standard 'rewriting' services in web servers:
Apache mod_rewrite guide
RewriteRule ^/xsdf$ folder1/folder2/page.html [R]
RewriteRule ^/qwer$ folder2/folder3/page.html [R]
RewriteRule ^/polz$ folder7/folder6/page.html [R]
nginx HttpRewriteModule.
rewrite ^/xsdf$ folder1/folder2/page.html redirect;
rewrite ^/qwer$ folder2/folder3/page.html redirect;
rewrite ^/polz$ folder7/folder6/page.html redirect;
Updating these rewrite rules involves editing the server config files, or dropping new ones in place. The other mechanism would be outside the range of the web server itself, so it might be easier or harder for long-term maintenance depending upon which your team would rather work with in the future.
I noticed that https://stackoverflow.com//////////questions/4659504/ is a valid URL. However https://www.google.com//////////analytics/settings is not. Are there differences inherent in web server technologies that explain this? Should a url with unnecessary slashes be interpreted correctly or should it return an error?
First of all, adding a slash changes the semantics of a URL path like any other character does. So by definition /foo/bar and /foo//bar are not equivalent just as /foo/bar and /foo/bar/ are not equivalent.
But since the URL path is mostly used to be directly mapped onto the file system, web servers often remove empty path segments (Apache does that) so that /foo//bar and /foo/bar are handled equivalently. But this is not the expected behavior; it’s rather done for error correction.
They are both valid URLs.
However, Google's server can't handle the second one.
There is no specific reason to either handle or reject URLs with duplicate slashes; you should spend more time on more important things.
What do you consider "interpreted correctly"? HTTP only really specifices how the stuff in front of the slash after the server name gets interpreted. The rest is entirely up to the web server. It parses what you give it after that point (in whatever manner it likes) and presents you with whatever HTML it feels like providing for that text.
There is a difference in how every application processes requests. If you setup your app to replace succeeding slashes before routing the request you shouldn't have any problems.
Let's say, on a ColdFusion site, that the user has navigated to
http://www.example.com/sub1/
The server-side code typically used to tell you what URL the user is at, looks like:
http://#cgi.server_name##cgi.script_name#?#cgi.query_string#
however, "cgi.script_name" automatically includes the default cfm file for that folder- eg, that code, when parsed and expanded, is going to show us "http://www.example.com/sub1/index.cfm"
So, whether the user is visiting sub1/index.cfm or sub1/, the "cgi.script_name" var is going to include that "index.cfm".
The question is, how does one figure out which URL the user actually visited? This question is mostly for SEO-purposes- It's often preferable to 301 redirect "/index.cfm" to "/" to make sure there's only one URL for any piece of content- Since this is mostly for the benefit of spiders, javascript isn't an appropriate solution in this case. Also, assume one does not have access to isapi_rewrite or mod_rewrite- The question is how to achieve this within ColdFusion, specifically.
I suppose this won't be possible.
If the client requests "GET /", it will be translated by the web server to "GET /{whatever-default-file-exists-fist}" before ColdFusion even gets invoked. (This is necessary for the web server to know that ColdFusion has to be invoked in the first place!)
From ColdFusion's (or any application server's) perspective, the client requested "GET /index.cfm", and that's what you see in #CGI#.
As you've pointed out yourself, it would be possible to make a distinction by using a URL-rewriting tool. Since you specifically excluded that path, I can only say that you're out of luck here.
Not sure that it is possible using CF only, but you can make the trick using webserver's URL rewriting -- if you're using them, of course.
For Apache it can look this way. Say, we're using following mod_rewrite rule:
RewriteRule ^page/([0-9]+)/?$
index.cfm?page=$1&noindex=yes [L]
Now when we're trying to access URL http://website.com/page/10/ CGI shows:
QUERY_STRING page=10&noindex=yes
See the idea? Think same thing is possible when using IIS.
Hope this helps.
I do not think this is possible in CF. From my understanding, the webserver (Apache, IIS, etc) determines what default page to show, and requests it from CF. Therefore, CF does not know what the actual called page is.
Sergii is right that you could use URL rewrting to do this. If that is not available to you, you could use the fact that a specific page is given precedence in the list of default pages.
Let's assume that default.htm is the first page in the list of default pages. Write a generic default.htm that automatically forwards to index.cfm (or whatever). If you can adjust the list of defaults, you can have CF do a 301 redirect. If not, you can do a meta-refresh, or JS redirect, or somesuch in an HTML file.
I think this is possible.
Using GetHttpRequestData you will have access to all the HTTP headers.
Then the GET header in that should tell you what file the browser is requesting.
Try
<cfdump var="#GetHttpRequestData()#">
to see exactly what you have available to use.
Note - I don't have Coldfusion to hand to verify this.
Edit: Having done some more research it appears that GetHttpRequestData doesn't include the GET header. So this method probably won't work.
I am sure there is a way however - try dumping the CGI scope and see what you have.
If you are able to install ISAPI_rewrite (Assuming you're on IIS) - http://www.helicontech.com/isapi_rewrite/
It will insert a variable x-rewrite-url into the GetHttpRequestData() result structure which will either have / or /index.cfm depending on which URL was visited.
Martin