I've been reading many articles about SEO and investigating how to improve my site. I found an article that said that having friendly URLs help online indexers to find and positionate your site better than using URLs with lots of GET parameters so I decided to adapt my site to this kind of URL. I've also read that there's a way (editing .htaccess) but it's not the best way and it doesn't look really good.
For example, that's how Google's About URL looks like:
https://www.google.com/search/about/es/
When surfing into FTP do they see the directories search/about/es/index.html? If so, you must create many files and directories for each language instead of using &l=es, is it that worth?
You can never know (for sure) how resources are mapped to URLs.
For example, the URL https://www.google.com/search/about/es/ could
point to the HTML file /search/about/es/index.html
point to the HTML file /foo/bar/1.html
point to the PHP script /index.php
point to the PHP script /search.php?title=about&lang=es
point to the document available from the URL https://internal.google.com/1238
…
It’s always the server that, given the URL from the request, decides which resource to deliver. Unless you have access to the server, you can’t know how. (Even if a URL ends with .php, it’s not necessarily the case that PHP is involved at all.)
The server could look for a file that physically exists (if URL rewriting is involved: even in "other" places than what the URL path suggests), the server could run a script that generates a document on the fly (e.g., taking the content from your database), the server could output the file available from another URL, etc.
Related Wikipedia articles:
Rewrite engine
Web framework: URL mapping
Front controller
Related
I am hosting a static website generated with Middleman on CloudFront and S3. I want to add multiple language support and middleman allows me to localize the content and have the english version at /index.html and the translated content at /sp/index.html for example.
I would like to be able to detect the "Accept-Language" header in the request and based on that server either /index.html or /sp/index.html .
Based on my research I cannot see a way of doing this with S3 and Cloudfront, but maybe you guys have an idea?
If there is no "proper and good way" of doing this with CloudFront and S3, what would be the next best alternative? Currently I am thinking of detecting the language in JavaScript and then redirecting the user if the language is not english.
Greetings, Kim
As mentioned in the comments you will need some kind of arbitrator that can read request headers and either redirect or serve dynamic content. S3 is the problem there.
CloudFront can forward the Accept-Language header to your origin server, and ensure that content is only cached per-language. So that part isn't a problem.
If S3 is your origin, then you have a problem because your files are static and unable to process the incoming request with the language information. I don't recommend trying to detect language with JavaScript. It's problematic.
Although CloudFront can be configured with multiple origins (one per language, in your case) it cannot forward to these based on request header. Currently "behaviours" can only match the URL path. I suspect they could introduce header rules at some point, but until they do (or unless you can find another CDN that does) I'm afraid my answer is going to be a "you can't" answer.
As your site is all flat HTML, I suspect you're not interested in a convoluted solution that comprises various CloudFront behaviours and dynamic server scripts, etc..
I think your best option by far is a simple, low-tech one --
Offer the visitor a choice of language and allow them to switch language from any page. This also avoids surprises - If I google something in English, but I speak Spanish I should see the English page that I googled and then switch to Spanish if I feel like it.
Its just a question out of curiosity.
I have seen a lot of websites that doesn't show the page types/extensions in the address bar.For example, the stackoverflow's Ask Question page has the address stackoverflow.com/questions/ask instead of something like stackoverflow.com/questions/ask.php.
Do they use something to hide that page extension?Or why I do not see the page extension?
I think its a nice think for page security.
using .htaccess file, you can do that
something similar here Remove .php extension with .htaccess
All the .htaccess answers that you have seen apply to traditional PHP applications because they are all uploaded as normal files to the document root of a webserver. This means that each PHP file is "browsable" directly, assuming you haven't prevented this at your webserver configuration.
StackOverflow (which is a .NET application) and other modern applications use a URL mapping paradigm - not only does this help with "clean" URLs, but also because cool URIs don't change. It really doesn't have anything to do with security.
So it is most likely that each URL is mapped to a function, this function returns a response that is sent to the browser.
PHP frameworks offer the same - Laravel routing, symfony routing and zend framework routing are all examples of this mapping paradigm.
A .htaccess (hypertext access) file is a directory-level configuration file supported by several web servers, that allows for decentralized management of web server configuration. They are placed inside the web tree, and are able to override a subset of the server's global configuration for the directory that they are in, and all sub-directories.
htaccess file
Rewrite Guides
More htaccess tips and tricks
Rewrite Url
Servers often use .htaccess to rewrite long, overly comprehensive URLs to shorter and more memorable ones.
Authorization, authentication
A .htaccess file is often used to specify security restrictions for a directory, hence the filename "access". The .htaccess file is often accompanied by a .htpasswd file which stores valid usernames and their passwords
Given three links above these will explain you in better way.
this is done by using the .htaccess file to configure the details of a website
example:
RewriteEngine on
Rewrite Base /
RewriteRule ([a-z]+)/?$ index.php?menu=$1 [NC,L]
this example rewrites a URL which looks like this www.mydomain.com/home into www.mydomain.com?index.php&menu=home
for more details please search stackoverflow / google
I'm a novice web developer with some background in programming (mostly Python).
I'm looking for some basic advice on choosing the right technology.
I need to serve files over the internet (mp3's), but I need to implement some
control on the access:
1. Files will be accessible only for authorized users.
2. I need to keep track on how many times a file was loaded, by whom, etc.
What might be the best technology to implement this? That is, should I
learn Apache, or maybe Django? or maybe something else?
I'm looking for a 'pointer' in the right direction.
Thank!
R
If you need to track/control the downloads that suggests that the MP3 urls need to be routed through a Rails controller. Very doable. At that point you can run your checks, track your stats, and send the file back.
If it's a lot of MP3's, you would like to not have Rails do the actual sending of the MP3 data as it's a waste of it's time and ties up an instance. Look into xsendfile where Rails can send a response header indicating the file path to send and apache will intercept it and do the actual sending.
https://tn123.org/mod_xsendfile/
http://rack.rubyforge.org/doc/classes/Rack/Sendfile.html
You could use Django and Lighttpd as a web server. With Lighttpd you can use mod_secdownload, wich enables you to generate one time only urls.
More info can be found here: http://redmine.lighttpd.net/projects/1/wiki/Docs_ModSecDownload
You can check for permissions in your Django (or any other) app and then redirect the user to this disposable URL if he passed the permission check.
The tutorials I'm reading say to do that, but none of the websites I use do it. Why not?
none of the websites I use [put .htm into urls] Why not?
The simple answer would be:
Most sites offer dynamic content instead of static html pages.
Longer answer:
The file extension doesn't matter. It's all about the web server configuration.
Web server checks the extension of the file, then it knows how to handle it (send .html straight to client, run .php through mod_php and generate a html page etc.) This is configurable.
Then web server sends the content (static or generated) to the client, and the http protocol includes telling the client the type of the content in the headers before the web page is sent.
By the way, .htm is no longer needed. We don't use DOS with 8.3 filenames anymore.
To make it even more complicated: :-)
Web server can do url rewriting. For example it could redirect all urls of form : www.foo.com/photos/[imagename] to actual script located in www.foo.com/imgview.php?image=[imagename]
The .htm extension is an abomination left over from the days of 8.3 file name length limitations. If you're writing HTML, its more properly stored in a .html file. Bear in mind that a URL that you see in your browser doesn't necessarily correspond directly to some file on the server, which is why you rarely see .html or .htm in anything other than static sites.
I presume you're reading tutorials on creating static html web pages. Most sites are dynamically generated from programs that use the url to determine the content you see. The url is not tied to a file. If no such dynamic programs are present, then files are urls are synonomous.
If you can, leave off the .htm (or any file extension). It adds nothing to the use of the site, and exposes an irrelevant detail in the URL.
There's no need to put .htm in your URL's. Not only does it expose an unnecessary backend detail about your site, it also means that there is less room in your URLs for other characters.
It's true that URL's can be insanely long... but if you email a long link, it will often break. Not everyone uses TinyURL and the like, so it makes sense to keep your URL's short enough so that they don't get truncated in emails. Those four characters (.htm) might make the difference between your emailed url getting truncated or not!
Let's say, on a ColdFusion site, that the user has navigated to
http://www.example.com/sub1/
The server-side code typically used to tell you what URL the user is at, looks like:
http://#cgi.server_name##cgi.script_name#?#cgi.query_string#
however, "cgi.script_name" automatically includes the default cfm file for that folder- eg, that code, when parsed and expanded, is going to show us "http://www.example.com/sub1/index.cfm"
So, whether the user is visiting sub1/index.cfm or sub1/, the "cgi.script_name" var is going to include that "index.cfm".
The question is, how does one figure out which URL the user actually visited? This question is mostly for SEO-purposes- It's often preferable to 301 redirect "/index.cfm" to "/" to make sure there's only one URL for any piece of content- Since this is mostly for the benefit of spiders, javascript isn't an appropriate solution in this case. Also, assume one does not have access to isapi_rewrite or mod_rewrite- The question is how to achieve this within ColdFusion, specifically.
I suppose this won't be possible.
If the client requests "GET /", it will be translated by the web server to "GET /{whatever-default-file-exists-fist}" before ColdFusion even gets invoked. (This is necessary for the web server to know that ColdFusion has to be invoked in the first place!)
From ColdFusion's (or any application server's) perspective, the client requested "GET /index.cfm", and that's what you see in #CGI#.
As you've pointed out yourself, it would be possible to make a distinction by using a URL-rewriting tool. Since you specifically excluded that path, I can only say that you're out of luck here.
Not sure that it is possible using CF only, but you can make the trick using webserver's URL rewriting -- if you're using them, of course.
For Apache it can look this way. Say, we're using following mod_rewrite rule:
RewriteRule ^page/([0-9]+)/?$
index.cfm?page=$1&noindex=yes [L]
Now when we're trying to access URL http://website.com/page/10/ CGI shows:
QUERY_STRING page=10&noindex=yes
See the idea? Think same thing is possible when using IIS.
Hope this helps.
I do not think this is possible in CF. From my understanding, the webserver (Apache, IIS, etc) determines what default page to show, and requests it from CF. Therefore, CF does not know what the actual called page is.
Sergii is right that you could use URL rewrting to do this. If that is not available to you, you could use the fact that a specific page is given precedence in the list of default pages.
Let's assume that default.htm is the first page in the list of default pages. Write a generic default.htm that automatically forwards to index.cfm (or whatever). If you can adjust the list of defaults, you can have CF do a 301 redirect. If not, you can do a meta-refresh, or JS redirect, or somesuch in an HTML file.
I think this is possible.
Using GetHttpRequestData you will have access to all the HTTP headers.
Then the GET header in that should tell you what file the browser is requesting.
Try
<cfdump var="#GetHttpRequestData()#">
to see exactly what you have available to use.
Note - I don't have Coldfusion to hand to verify this.
Edit: Having done some more research it appears that GetHttpRequestData doesn't include the GET header. So this method probably won't work.
I am sure there is a way however - try dumping the CGI scope and see what you have.
If you are able to install ISAPI_rewrite (Assuming you're on IIS) - http://www.helicontech.com/isapi_rewrite/
It will insert a variable x-rewrite-url into the GetHttpRequestData() result structure which will either have / or /index.cfm depending on which URL was visited.
Martin