Can a browser correct a "mangled url" automatically? - url

I faced a problem some time back on a particular website. It has given many hyperlinks on it to other sites. e.g. of one such URL is:
http://http//example.com/a9noaa.asp
It is clearly incorrect (http comes twice) URL so when one clicks on it there is a page error like "Address not found".
But when one copies the link location and pastes it in the browser’s location bar, it loads that new page correctly. So it’s the problem of incorrect URL being mentioned in the hyperlink.
Will it be possible to make browser check for basic sanity of the URL being accessed like checking that:
word http is present only once,
colon is typed correct,
no unusual character at beginning of URL,
double backlashes are correctly present, etc.
Or that the URL being typed in the address bar and automatically correct the errors in it?
Can any client side code be present to make a internet browser achieve this functionality? Is it possible?
Or are there any plugins for popular browsers (Firefox, IE) already available to achieve this?
Thank you.
-AD.

First of all, http://http//example.com/a9noaa.asp is a valid URI with http as the scheme, the second http as the host name and //example.com/a9noaa.asp as the path. So if it’s not invalid, the browser has no need to correct it.
Now let’s look at the location bar. Most user friendly browsers do some error correction if the location that has been entered is invalid. One of that correction measures is to prepend the string with http:// if that’s not present. So you just have to type example.com to request http://example.com.
Another correction measure is to complete unknown host names with http://www. and and .com before and after the entered string. So you just have to type example, hit enter and you request http://www.example.com.
But any error correction outside the location bar can especially in hyperlinks can be crucial. Take this for example: A guest enters his/her website URI in a guestbook entry but ommits the http://. Now that value is used in a hyperlink but the missing http:// is not prefixed. So the link might look like this:
Website
If you click on such a link, the relative URI of that link would be resolved to an absolute URI using the current document’s URI as the base. So the link might be expanded to http://some.example/guestbook/example.com. Who hasn’t experienced that?
But correcting that missing http:// in the browser is fatal. Because the auther might have intended to reference http://some.example/guestbook/example.com instead of http://example.com that the browser would expect.
So to round it up: Correcting the user’s location bar input suitable when there is something missing (e.g. the http://). But doing that on every link is not.

The URL you posted is not "incorrect", it is valid. Hostnames can take many forms, such as http://localhost/ or http://http/ as well as the more common http://example.com
If you don't include http:// or another protocol in a web link, then the browser assumes you are using a relative link. For example...
link
...will link to http://yoursite.com/www.example.com, because this is a perfectly valid URL - you can name a file www.example.com.
I would recommend contacting the website in question to fix their error. No browsers will correct this automatically.

It really shouldn't be up to the browser to correct mal-formed URLs. A URL is supposed to be a unique identifier of some page. The one doing the linking to the page should take care to link to the correct page. There must be no guesswork involved in opening a URL.
That said, some browsers are better than others. Of the top of my head I think IE won't understand "localhost:8888/test" (no protocol given and not standard port 80), but Firefox will at least try to access it via "http://localhost:8888/test". This kind of best-guess filling-in-the-blanks is fine I think, any further auto-correction would be doing too much.
Safari for example will try to auto-guess domain names for you. If "apple/safari" yields a DNS error, it'll automatically try to complete the address to "apple.com/safari". With your URL it might try to complete it to "http://http.com//example.com/a9noaa.asp", which might yield a page if http.com exists. There's just no one way of doing it, therefore it shouldn't be done at all.

Related

how to add subdomain name from current url using .htacces rules

I have a URL link like,
http://domain.com/abs/def/city and,
i want to display it as http://city.domain.com/ABC/def
using .htaccess.
Can any one help me by providing .ht access rules.
I want to write .htaccess rules for each city name in URL act as sub domain name.
Also i want it to be dynamic as there are different cities are available in site.
i am using below code in .htaccess file, but not working properly.
RewriteRule ^index.php/(.)/(.)/([^/]+)$ http://$3.domain/$1/$2/$3 [R=301,L]
is there any way to get my requirement using or by modifying my above code or by some other .htaccess code.
Sorry, but what you ask is not possible. This is a typical missunderstanding about url rewriting:
Url rewriting rewrites (manipulates) incoming requests on the server side before processing them. It is not possible to alter outgoing content such that contained urls are changed by this means.
There are solutions for that though:
apaches proxy module can "map" one url into the scope of some other url
there are also modules for automatic post processing of generated html markup
more exotic or creative solutions exist, it depends on your situation in the end...
But usually the easiest is to change the application (typically just its central configuration) such that it contains final urls (pointing to the subdomain in your case). Then you can indeed use the rewriting module to "re-map" those to the previous scope when future incoming requests refer to them (they got clicked).
Ok, second step getting additional info from your comments:
Just to get this clear: you understand that it is not possible to change the link you send out by means of rewriting, but you want to change the url shown in the browser after the user has clicked on some city link? That is something different to what you wrote before, that actually is possible. Great.
If the rewriting works as you want it to (you see the desired url in the browsers address bar), then we can go on. The error message indicates a name resolution problem, that has nothing to do with rewriting. Most likely the domain "cambridge.192.168.2.107" cannot be resolved, which is actually not surprising. You cannot mix ip addresses and names, it is either or.
Also I see that you are using internal, non-routable addresses. So you also are responsible for the name resolution yourself, since no public DNS server can guess what you are setting up internally. Did you do that?
I suggest these steps:
stop using an ip addres for this, use a domain name.
since you are working internally, take care that that domain name is actually resolved to your local systems ip address. How you do this depends on your setup and system, obviously. Most likely you need some entry in the file /etc/hosts or similar.
you need to take care that also those "subdomain names" get resolved to the same address. This is not trivial, again it depends on the setting and system you locally use.
if that name resolution works, then you should see a request in your http servers access log file. Then and only then it makes sense to go on...

CNAME url on site transfered to

Is it possible to have the site you transfer to from a CNAME and 'A' record keep the name from the CNAME in the url? Example: Site 123.com has a CNAME called app pointing to abc.com. When I type in app.123.com, it transfers me to abc.com and keeps app.123.com in the url on the first page that I transfer to. But once I click anything on the page to move around within the page, the url reverts back to abc.com. Is it possible to have the app.123.com stay in the url while I move around. So instead of changing to abc.com/otherPage.php, it would stay app.123.com/otherPage.php?
Thanks for any help!
There are two different mechanisms at work here, DNS and HTTP.
When you use a URL containing app.123.com, that's looked up using DNS and gives the IP address that happens to be the same as the IP address configured for abc.com
But once you're viewing that page, the HTML of that page will determine how the links work, they may be absolute URLs including abc.com or they may be relative URLs, so behaviour could vary.
A site can be made to work under multiple aliases but it would need to use the request headers to detect the address used by the client and alter content accordingly.

How does this website detect the user's current URL from a different domain?

This is a website which is relevant to the topic that I am researching - getting an IFrame's current URL address from another domain.
Here it is: http://hidemyipaddress.org/ (to use it simply go to the bottom, enter a website address and click "go").
You can surf any website through their website - and the amazing thing is that they can keep track of your current location, and even show it to you. (Here is a picture to illustrate: http://img199.imageshack.us/img199/6343/image2eb.jpg)
The reason I am asking is because I am trying to do the same thing.
How is this possible, isn't that XSS or something? Thanks for taking your time on this.
This is web based proxy. When you enter an address into the proxy address input and hit search, you are requesting that the proxy server retrieves the website for you. The proxy server requests the page you have asked for, parses the HTML so that all URIs are "proxied URIs", adds any additional HTML such as banners and then returns the page in the http response.
If there were an iframe, the current URL of the iframe would actually be on the same domain. It's a proxy, so the server at hidemyaddress.org is actually returning the html to your client. Furthermore the address of an iframe would be irrelevant. The uri in that address box would just displays the address that you requested. It would not reflect on the src of an iframe or the current location of that frame.

Is there a script or other method for obtaining the correct variation of a URL for a web page?

I'm assuming there is a single correct variation of a URL for every page. Please correct me if I'm wrong.
Given an input of an equivalent URL, I need to get the correction of a URL. For example, most browsers accept slight variations from the exact URL but then correct it to take you to the right page? (Or perhaps this is done at the DNS level?)
The task I'm working on is getting the correct MD5 hash of a URL that will be recognized by an API service that returns information about a URL. For example, if I hash 'http://stackoverflow.com', I get an empty response. In order to get a valid response I need to hash 'https://stackoverflow.com/', (with a trailing slash).
EDIT: The API service I'm using is the Delicious API. In case that resonates with anyone's experience.
I'm assuming there is a single correct variation of a URL for every page. Please correct me if I'm wrong.
There is only a single "correct" one if the author decides that there should be, then they will likely use a combination of canonical and HTTP redirects to push people in that direction.
For example, most browsers accept slight variations from the exact URL but then correct it to take you to the right page?
Host names are case insensitive, and the root doesn't need a slash (so http://example.com and http://EXAMPLE.cOM/ are identical).
Beyone that, the rest of the URL (except for a fragment identifier if there is one) is handled entirely by the HTTP server. It might treat it case sensitive, it might not. It might require things in a certain order, it might not.

How do you see the client-side URL in ColdFusion?

Let's say, on a ColdFusion site, that the user has navigated to
http://www.example.com/sub1/
The server-side code typically used to tell you what URL the user is at, looks like:
http://#cgi.server_name##cgi.script_name#?#cgi.query_string#
however, "cgi.script_name" automatically includes the default cfm file for that folder- eg, that code, when parsed and expanded, is going to show us "http://www.example.com/sub1/index.cfm"
So, whether the user is visiting sub1/index.cfm or sub1/, the "cgi.script_name" var is going to include that "index.cfm".
The question is, how does one figure out which URL the user actually visited? This question is mostly for SEO-purposes- It's often preferable to 301 redirect "/index.cfm" to "/" to make sure there's only one URL for any piece of content- Since this is mostly for the benefit of spiders, javascript isn't an appropriate solution in this case. Also, assume one does not have access to isapi_rewrite or mod_rewrite- The question is how to achieve this within ColdFusion, specifically.
I suppose this won't be possible.
If the client requests "GET /", it will be translated by the web server to "GET /{whatever-default-file-exists-fist}" before ColdFusion even gets invoked. (This is necessary for the web server to know that ColdFusion has to be invoked in the first place!)
From ColdFusion's (or any application server's) perspective, the client requested "GET /index.cfm", and that's what you see in #CGI#.
As you've pointed out yourself, it would be possible to make a distinction by using a URL-rewriting tool. Since you specifically excluded that path, I can only say that you're out of luck here.
Not sure that it is possible using CF only, but you can make the trick using webserver's URL rewriting -- if you're using them, of course.
For Apache it can look this way. Say, we're using following mod_rewrite rule:
RewriteRule ^page/([0-9]+)/?$
index.cfm?page=$1&noindex=yes [L]
Now when we're trying to access URL http://website.com/page/10/ CGI shows:
QUERY_STRING page=10&noindex=yes
See the idea? Think same thing is possible when using IIS.
Hope this helps.
I do not think this is possible in CF. From my understanding, the webserver (Apache, IIS, etc) determines what default page to show, and requests it from CF. Therefore, CF does not know what the actual called page is.
Sergii is right that you could use URL rewrting to do this. If that is not available to you, you could use the fact that a specific page is given precedence in the list of default pages.
Let's assume that default.htm is the first page in the list of default pages. Write a generic default.htm that automatically forwards to index.cfm (or whatever). If you can adjust the list of defaults, you can have CF do a 301 redirect. If not, you can do a meta-refresh, or JS redirect, or somesuch in an HTML file.
I think this is possible.
Using GetHttpRequestData you will have access to all the HTTP headers.
Then the GET header in that should tell you what file the browser is requesting.
Try
<cfdump var="#GetHttpRequestData()#">
to see exactly what you have available to use.
Note - I don't have Coldfusion to hand to verify this.
Edit: Having done some more research it appears that GetHttpRequestData doesn't include the GET header. So this method probably won't work.
I am sure there is a way however - try dumping the CGI scope and see what you have.
If you are able to install ISAPI_rewrite (Assuming you're on IIS) - http://www.helicontech.com/isapi_rewrite/
It will insert a variable x-rewrite-url into the GetHttpRequestData() result structure which will either have / or /index.cfm depending on which URL was visited.
Martin

Resources