Should I put .htm at the end of my urls? - url

The tutorials I'm reading say to do that, but none of the websites I use do it. Why not?

none of the websites I use [put .htm into urls] Why not?
The simple answer would be:
Most sites offer dynamic content instead of static html pages.
Longer answer:
The file extension doesn't matter. It's all about the web server configuration.
Web server checks the extension of the file, then it knows how to handle it (send .html straight to client, run .php through mod_php and generate a html page etc.) This is configurable.
Then web server sends the content (static or generated) to the client, and the http protocol includes telling the client the type of the content in the headers before the web page is sent.
By the way, .htm is no longer needed. We don't use DOS with 8.3 filenames anymore.
To make it even more complicated: :-)
Web server can do url rewriting. For example it could redirect all urls of form : www.foo.com/photos/[imagename] to actual script located in www.foo.com/imgview.php?image=[imagename]

The .htm extension is an abomination left over from the days of 8.3 file name length limitations. If you're writing HTML, its more properly stored in a .html file. Bear in mind that a URL that you see in your browser doesn't necessarily correspond directly to some file on the server, which is why you rarely see .html or .htm in anything other than static sites.

I presume you're reading tutorials on creating static html web pages. Most sites are dynamically generated from programs that use the url to determine the content you see. The url is not tied to a file. If no such dynamic programs are present, then files are urls are synonomous.

If you can, leave off the .htm (or any file extension). It adds nothing to the use of the site, and exposes an irrelevant detail in the URL.

There's no need to put .htm in your URL's. Not only does it expose an unnecessary backend detail about your site, it also means that there is less room in your URLs for other characters.
It's true that URL's can be insanely long... but if you email a long link, it will often break. Not everyone uses TinyURL and the like, so it makes sense to keep your URL's short enough so that they don't get truncated in emails. Those four characters (.htm) might make the difference between your emailed url getting truncated or not!

Related

Are friendly URLs based on directories?

I've been reading many articles about SEO and investigating how to improve my site. I found an article that said that having friendly URLs help online indexers to find and positionate your site better than using URLs with lots of GET parameters so I decided to adapt my site to this kind of URL. I've also read that there's a way (editing .htaccess) but it's not the best way and it doesn't look really good.
For example, that's how Google's About URL looks like:
https://www.google.com/search/about/es/
When surfing into FTP do they see the directories search/about/es/index.html? If so, you must create many files and directories for each language instead of using &l=es, is it that worth?
You can never know (for sure) how resources are mapped to URLs.
For example, the URL https://www.google.com/search/about/es/ could
point to the HTML file /search/about/es/index.html
point to the HTML file /foo/bar/1.html
point to the PHP script /index.php
point to the PHP script /search.php?title=about&lang=es
point to the document available from the URL https://internal.google.com/1238
…
It’s always the server that, given the URL from the request, decides which resource to deliver. Unless you have access to the server, you can’t know how. (Even if a URL ends with .php, it’s not necessarily the case that PHP is involved at all.)
The server could look for a file that physically exists (if URL rewriting is involved: even in "other" places than what the URL path suggests), the server could run a script that generates a document on the fly (e.g., taking the content from your database), the server could output the file available from another URL, etc.
Related Wikipedia articles:
Rewrite engine
Web framework: URL mapping
Front controller

Cloudfront/S3: Server different file depending on Request Header

I am hosting a static website generated with Middleman on CloudFront and S3. I want to add multiple language support and middleman allows me to localize the content and have the english version at /index.html and the translated content at /sp/index.html for example.
I would like to be able to detect the "Accept-Language" header in the request and based on that server either /index.html or /sp/index.html .
Based on my research I cannot see a way of doing this with S3 and Cloudfront, but maybe you guys have an idea?
If there is no "proper and good way" of doing this with CloudFront and S3, what would be the next best alternative? Currently I am thinking of detecting the language in JavaScript and then redirecting the user if the language is not english.
Greetings, Kim
As mentioned in the comments you will need some kind of arbitrator that can read request headers and either redirect or serve dynamic content. S3 is the problem there.
CloudFront can forward the Accept-Language header to your origin server, and ensure that content is only cached per-language. So that part isn't a problem.
If S3 is your origin, then you have a problem because your files are static and unable to process the incoming request with the language information. I don't recommend trying to detect language with JavaScript. It's problematic.
Although CloudFront can be configured with multiple origins (one per language, in your case) it cannot forward to these based on request header. Currently "behaviours" can only match the URL path. I suspect they could introduce header rules at some point, but until they do (or unless you can find another CDN that does) I'm afraid my answer is going to be a "you can't" answer.
As your site is all flat HTML, I suspect you're not interested in a convoluted solution that comprises various CloudFront behaviours and dynamic server scripts, etc..
I think your best option by far is a simple, low-tech one --
Offer the visitor a choice of language and allow them to switch language from any page. This also avoids surprises - If I google something in English, but I speak Spanish I should see the English page that I googled and then switch to Spanish if I feel like it.

Xhtml namespace on https site?

Im migrating a site from using http to redirect all requests to https and therefor im making sure external script, images etc are references with just // in the beginning of the url instead of http://
My question is this. Do i also need to change stuff like the xhtml namespaces for the html tag or the doctype declaration url? And if I do need to change this, will they resolve urls starting with //?
Namespaces are identifying strings that happen to use URL syntax. They should not be changed.
The DTD is a tricky one.
In theory, if it was altered with a man-in-the-middle attack, then it could be used to change named entities and insert new content into the document.
In practise however, browsers don't generally parse the DTD so this isn't really a worry. Additionally, W3C DTDs are not served over HTTPS so you can't reference them without copying the files to your own server (and possibly updating internal references). If you want to be really safe, you should do this.
Personally, I'd scrap DTDs and just use (X)HTML 5.

Resource for Recognizing Framework/CMS From URL or Other Clues?

I'm curious to know which web framework or content management system a website is using based upon clues from the URL, headers, content. Does anyone know of a resource on the web that would provide this? For example:
.html -> maybe a flat-file
.php -> something built using PHP, perhaps.
.jsp -> something using Java Server Pages
.asp -> Active Server Pages
0,2097,1-1-1928,00 -> Vignette CMS
.do -> ??
Thanks.
Finding CMS by url
http://2ip.ru/cms/
enter URL in center input field and click big blue button below
Results in black - not found,
in red - found
NOTE:
May play around with url path: with http://, with or without www. part -- results may differ.
If you're not restricted to just the query string then there are a few other options. For example to identify a rails app:
Script, stylesheet and image tags tend to have a 10 digits number appended (this allows you to cache, and still change the file):
<script src="/javascripts/all.js?1236037318" type="text/javascript"></script>
You can also sometimes tell from the cookies what the framework is. For example rails apps tend to have a session cookie called _appName_session, and often you can find a flash contained.
You're on the right track with your list there. If all you want to know is the stack (LAMP, IIS, Java) then that's all you really need.
If querying the URL in question is an option, then you can usually pull the webserver make/version out of the HTTP response header as well.
There is a nifty Chrome extension called Wappalyzer:
Wappalyzer is a browser extension that uncovers the technologies used
on websites. It detects content management systems, eCommerce
platforms, web servers, JavaScript frameworks, analytics tools and
many more.

How do you see the client-side URL in ColdFusion?

Let's say, on a ColdFusion site, that the user has navigated to
http://www.example.com/sub1/
The server-side code typically used to tell you what URL the user is at, looks like:
http://#cgi.server_name##cgi.script_name#?#cgi.query_string#
however, "cgi.script_name" automatically includes the default cfm file for that folder- eg, that code, when parsed and expanded, is going to show us "http://www.example.com/sub1/index.cfm"
So, whether the user is visiting sub1/index.cfm or sub1/, the "cgi.script_name" var is going to include that "index.cfm".
The question is, how does one figure out which URL the user actually visited? This question is mostly for SEO-purposes- It's often preferable to 301 redirect "/index.cfm" to "/" to make sure there's only one URL for any piece of content- Since this is mostly for the benefit of spiders, javascript isn't an appropriate solution in this case. Also, assume one does not have access to isapi_rewrite or mod_rewrite- The question is how to achieve this within ColdFusion, specifically.
I suppose this won't be possible.
If the client requests "GET /", it will be translated by the web server to "GET /{whatever-default-file-exists-fist}" before ColdFusion even gets invoked. (This is necessary for the web server to know that ColdFusion has to be invoked in the first place!)
From ColdFusion's (or any application server's) perspective, the client requested "GET /index.cfm", and that's what you see in #CGI#.
As you've pointed out yourself, it would be possible to make a distinction by using a URL-rewriting tool. Since you specifically excluded that path, I can only say that you're out of luck here.
Not sure that it is possible using CF only, but you can make the trick using webserver's URL rewriting -- if you're using them, of course.
For Apache it can look this way. Say, we're using following mod_rewrite rule:
RewriteRule ^page/([0-9]+)/?$
index.cfm?page=$1&noindex=yes [L]
Now when we're trying to access URL http://website.com/page/10/ CGI shows:
QUERY_STRING page=10&noindex=yes
See the idea? Think same thing is possible when using IIS.
Hope this helps.
I do not think this is possible in CF. From my understanding, the webserver (Apache, IIS, etc) determines what default page to show, and requests it from CF. Therefore, CF does not know what the actual called page is.
Sergii is right that you could use URL rewrting to do this. If that is not available to you, you could use the fact that a specific page is given precedence in the list of default pages.
Let's assume that default.htm is the first page in the list of default pages. Write a generic default.htm that automatically forwards to index.cfm (or whatever). If you can adjust the list of defaults, you can have CF do a 301 redirect. If not, you can do a meta-refresh, or JS redirect, or somesuch in an HTML file.
I think this is possible.
Using GetHttpRequestData you will have access to all the HTTP headers.
Then the GET header in that should tell you what file the browser is requesting.
Try
<cfdump var="#GetHttpRequestData()#">
to see exactly what you have available to use.
Note - I don't have Coldfusion to hand to verify this.
Edit: Having done some more research it appears that GetHttpRequestData doesn't include the GET header. So this method probably won't work.
I am sure there is a way however - try dumping the CGI scope and see what you have.
If you are able to install ISAPI_rewrite (Assuming you're on IIS) - http://www.helicontech.com/isapi_rewrite/
It will insert a variable x-rewrite-url into the GetHttpRequestData() result structure which will either have / or /index.cfm depending on which URL was visited.
Martin

Resources