Is a "file://" path a URL? - url

I sometimes see people refer to file system paths (POSIX/Windows) as both URIs and URLs. I'm no file system buff, but I have yet to find a file system path that conflicts with my understanding of the URL format. That is, of course, given that it includes the scheme name (e.g. file://localhost/path/to/file.txt).
File system paths are most definitely URIs - I mean, what's not - so everyone referring to file system paths as URIs is inside the safe zone. But is it safe to call them URLs?
If the URL was defined by a single (non-obsolete) RFC, rather than being comprised of half a dozen specialized ones, I wouldn't have to ask this question.

file is a registered URI scheme (for "Host-specific file names").
It links to RFC 1738, which is called "Uniform Resource Locators (URL)", in which file is specified:
A file URL takes the form:
file://<host>/<path>
So yes, file URIs are URLs.
However, the subdivision from URIs into URLs, URNs and "Other" (like data) is not that useful anyway. FWIW, the WHATWG URL spec tries to standardize on the term "URL" for all kind of URIs (even those that aren't URLs today, following the RFC). The W3C Note "URIs, URLs, and URNs: Clarifications and Recommendations 1.0" tries to summarize the confusion about the terms:
The body of documents (RFCs, etc) covering URI architecture, syntax, registration, etc., spans both the classical and contemporary periods. People who are well-versed in URI matters tend to use "URL" and "URI" in ways that seem to be interchangable. Among these experts, this isn't a problem. But among the Internet community at large, it is. People are not convinced that URI and URL mean the same thing, in documents where they (apparently) do. […]

Related

Is a protocol (eg. http or https) required for a URL to be valid?

Recently I came across a lot of code from analytics plugins where they specify the URL as //fonts.googleapis.com or //www.google.com.
Basically it starts with two forward slashes and then the domain or subdomain. These links work fine in browsers. I have read the following documents, but I am still not sure if above can be called valid URLs (basically should these be reported as broken URLs or not).
https://developer.mozilla.org/en-US/docs/Web/API/URL and
https://url.spec.whatwg.org/
Is there a standard specification that I can refer to?
They're both valid scheme-relative-URL strings, although they need to be in the context of a Base URL to be meaningful. When used within a web page, the web page will provide the Base URL context.
Although there are other, earlier standards for URLs, the whatwg document represents the most up-to-date, web compatible definition.

Are friendly URLs based on directories?

I've been reading many articles about SEO and investigating how to improve my site. I found an article that said that having friendly URLs help online indexers to find and positionate your site better than using URLs with lots of GET parameters so I decided to adapt my site to this kind of URL. I've also read that there's a way (editing .htaccess) but it's not the best way and it doesn't look really good.
For example, that's how Google's About URL looks like:
https://www.google.com/search/about/es/
When surfing into FTP do they see the directories search/about/es/index.html? If so, you must create many files and directories for each language instead of using &l=es, is it that worth?
You can never know (for sure) how resources are mapped to URLs.
For example, the URL https://www.google.com/search/about/es/ could
point to the HTML file /search/about/es/index.html
point to the HTML file /foo/bar/1.html
point to the PHP script /index.php
point to the PHP script /search.php?title=about&lang=es
point to the document available from the URL https://internal.google.com/1238
…
It’s always the server that, given the URL from the request, decides which resource to deliver. Unless you have access to the server, you can’t know how. (Even if a URL ends with .php, it’s not necessarily the case that PHP is involved at all.)
The server could look for a file that physically exists (if URL rewriting is involved: even in "other" places than what the URL path suggests), the server could run a script that generates a document on the fly (e.g., taking the content from your database), the server could output the file available from another URL, etc.
Related Wikipedia articles:
Rewrite engine
Web framework: URL mapping
Front controller

Cloudfront/S3: Server different file depending on Request Header

I am hosting a static website generated with Middleman on CloudFront and S3. I want to add multiple language support and middleman allows me to localize the content and have the english version at /index.html and the translated content at /sp/index.html for example.
I would like to be able to detect the "Accept-Language" header in the request and based on that server either /index.html or /sp/index.html .
Based on my research I cannot see a way of doing this with S3 and Cloudfront, but maybe you guys have an idea?
If there is no "proper and good way" of doing this with CloudFront and S3, what would be the next best alternative? Currently I am thinking of detecting the language in JavaScript and then redirecting the user if the language is not english.
Greetings, Kim
As mentioned in the comments you will need some kind of arbitrator that can read request headers and either redirect or serve dynamic content. S3 is the problem there.
CloudFront can forward the Accept-Language header to your origin server, and ensure that content is only cached per-language. So that part isn't a problem.
If S3 is your origin, then you have a problem because your files are static and unable to process the incoming request with the language information. I don't recommend trying to detect language with JavaScript. It's problematic.
Although CloudFront can be configured with multiple origins (one per language, in your case) it cannot forward to these based on request header. Currently "behaviours" can only match the URL path. I suspect they could introduce header rules at some point, but until they do (or unless you can find another CDN that does) I'm afraid my answer is going to be a "you can't" answer.
As your site is all flat HTML, I suspect you're not interested in a convoluted solution that comprises various CloudFront behaviours and dynamic server scripts, etc..
I think your best option by far is a simple, low-tech one --
Offer the visitor a choice of language and allow them to switch language from any page. This also avoids surprises - If I google something in English, but I speak Spanish I should see the English page that I googled and then switch to Spanish if I feel like it.

Validating URL domain in Rails

I want to validate a URL, so I searched and found this
Brian Ray said in his post that
"#Tate's answer is good for a full URL, but if you want to validate a domain column, you don't want to allow the extra URL bits his regex allows (e.g. you definitely don't want to allow a URL with a path to a file).
So I removed the protocol, port, file path, and query string parts of the regex, resulting in this:"
I don't understand what he said at all. How can a URL be a path to a file? What is a "domain column"?
A URL consists of several parts. If you have a very eleborate URL, like:
http://www.example.com:1234/path/to/file.html?key1=value1&key2=value2
The parts are:
protocol: http://
host name: www
domain name: example.com
port: 1234
file path: path/to/file.html
query string: key1=value1&key2=value2
The only parts that may not be omitted are the protocol (but many programs allow defaulting to http://) and host name. Each part has its own requirements for what are legal characters in it. And what's worse, not all web servers agree on what those requirements are. So the only thing you can check without making an actual connection and seeing if it fails, is the part which is needed to contact the web server. This is only the protocol, host and domain name, and port. These are all case insensitive (the rest may not be). I'm not sure what are valid characters in a host or domain name, but this is also something where name servers may not agree with the specification.
In short, the only way to check if an URL is valid is to try to make a connection to it. If your program uses some magic to reject URLs (or email addresses), some people are going to hate you and/or their internet provider for it (because even if your check follows the specification, some host or domain names don't).
As to your question how an URL can refer to a local file, there is a special protocol for that: file://. Since the path must start with a / as well, this results in URLs like file:///home/user/file.html, so with three slashes at the start.

Where is the difference between locating and identifying a resource? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
What's the difference between a URI and a URL?
Is there any difference? I'm talking about URI for identifying, but URL for locating. Aren't both the same thing?
They can look the same, but they're not the same thing. A URL identifies something that can be transferred over some protocol (often http). A URI, can be used to identify a namespace (for example) but there might not be any content at the address.
Where is the difference between locating a ressource and identifying a ressource?
Knowing who I am doesn't tell you anything about where I am.
A URI identifies a resource either by location, or a name, or both. More often than not, most of us use URIs that defines a location to a resource.
A URL is a specialization of URI that defines the network location of a specific resource.
Generally, if the URL describes both the location and name of a resource, the term to use is URI.
This article might help:
URI vs. URL
Excerpt:
"...a URL is a type of URI that
identifies a resource via a
representation of its primary access
mechanism (e.g., its network
"location"), rather than by some other
attributes it may have. Thus as we
noted, "http:" is a URI scheme. An
http URI is a URL. The phrase "URL
scheme" is now used infrequently,
usually to refer to some subclass of
URI schemes..."
An identifier is a unique name for something, so we can be sure that we talk about the same thing. For example the Atom namespace is 'http://www.w3.org/2005/Atom'. This is a URI. This doesnt mean that you can put this URI in a browser and have a document there (well, in case of Atom, yes, you have a document, but it's a simple presentation of Atom for convenience, it's not the Atom namespace itself).
A URL is the location of a document. This is what you can put in your browser. It is confusing that both use the same format (http://...) but that is mostly annecdotic ...
A URL is a URI which is not a URN. (see)

Resources