Why can protocol be omitted from absolute paths on a webpage? - url

I recently ran across a website that had some interesting styling on a select element. I went to investigate and found this (names changed to protect the innocent):
<script type="text/javascript" src="//www.domain.tld/file.js"></script>
It works despite HTTP: being omitted. What is the purpose of leaving off the protocol?

It will use the protocol you're already using. Useful for sites with both https and http versions.
So if you're on https://www.domain.tld/file.js the script will be https://www.domain.tld/file.js.
If you're on http://www.domain.tld/ the script will be http://www.domain.tld/file.js.

i believe this is short hand for a relative path to the protocol. So it should use the same protocol as is being used for that session. e.g if you grabbed that page with http, then this url is relative to http protocol

The purpose is that the scheme (ie. http or https) can be determined relative to the containing page. This is useful if you have a common piece of code included in multiple pages that can be served via http or https.

The purpose is to "use the same protocol as in the current URL" -- presumably (?) useful if the page can be reached both as http: and https: (I have a hard time thinking of other protocols yet that it might be useful for, and even this one is not a clear-cut use case).

Related

Is a protocol (eg. http or https) required for a URL to be valid?

Recently I came across a lot of code from analytics plugins where they specify the URL as //fonts.googleapis.com or //www.google.com.
Basically it starts with two forward slashes and then the domain or subdomain. These links work fine in browsers. I have read the following documents, but I am still not sure if above can be called valid URLs (basically should these be reported as broken URLs or not).
https://developer.mozilla.org/en-US/docs/Web/API/URL and
https://url.spec.whatwg.org/
Is there a standard specification that I can refer to?
They're both valid scheme-relative-URL strings, although they need to be in the context of a Base URL to be meaningful. When used within a web page, the web page will provide the Base URL context.
Although there are other, earlier standards for URLs, the whatwg document represents the most up-to-date, web compatible definition.

How can I allow Mixed contents (http with https) using content-security-policy meta tag?

I'm forcing https to access my website, but some of the contents must be loaded over http (for example video contents can not be over https), but the browsers block the request because of mixed-contents policy.
After hours of searching I found that I can use Content-Security-Policy but I have no idea how to allow mixed contents with it.
<meta http-equiv="Content-Security-Policy" content="????">
You can't.
CSP is there to restrict content on your website, not to loosen browser restrictions.
Secure https sites given users certain guarantees and it's not really fair to then allow http content to be loaded over it (hence the mixed content warnings) and really not fair if you could hide these warnings without your users consent.
You can use CSP for a couple of things to aid a migration to https, for example:
You can use it to automatically upgrade http request to https (though browser support isn't universal). This helps in case you missed changing a http link to https equivalent. However this assumes the resource can be loaded over https and sounds like you cannot load them over https so that's not an option.
You can also use CSP to help you identify any http resources on you site you missed by reporting back a message to a service you can monitor to say a http resource was attempted to be loaded. This allows you identify and fix the http links to https so you don't have to depend on above automatic upgrade.
But neither is what you are really looking for.
You shouldn't... but you CAN, the feature is demonstrated here an HTTP PNG image converted on-the-fly to HTTPS.
<meta http-equiv="Content-Security-Policy" content="upgrade-insecure-requests">
There's also a new permissions API, described here, that allows a Web server to check the user's permissions for features like geolocation, push, notification and Web MIDI.

How to implement Schema.org on HTTPS pages?

Is it correct to statically set up Microdata’s itemtype attribute with HTTP value (http://schema.org/WebPage) on HTTPS pages or do I need to use HTTPS value (https://schema.org/WebPage) on all pages?
Since both HTTP and HTTPS versions of the site are available, can I set it up to //schema.org/WebPage or not?
tl;dr: Use http URIs.
In this answer on Webmasters SE I explained why you should favor http over https Schema.org URIs: The http URIs seem to be canonical, as the actual definition of the Schema.org vocabulary only defines http, not https. In addition: all examples (even on HTTPS) use the HTTP variant, the authors mentioned that they prefer to see the use of the HTTP variant, and RDFa’s Initial Context defines the HTTP variant only (so most of the RDF world will use HTTP).
In this answer on Webmasters SE I explained why you should not use protocol-relative URIs for vocabularies: Vocabulary URIs typically don’t get dereferenced, and there will never get something embedded from a vocabulary, so there is absolutely no need to use HTTPS for these just because you use HTTPS (it’s similar to simply linking to an external page, which might not even be accessible via HTTPS). On top of that, your Schema.org markup would no longer work if the document is accessed via a different protocol than HTTP/HTTPS, and it’s likely that some parsers won’t be able to recognize that you are using the Schema.org vocabulary because they might look for full URIs without applying URI resolution for the itemtype attribute.
There's been an update to that answer on Webmasters SE (dated November 2015), with a link to the schema.org FAQ about https:
Q: Should we write https://schema.org or http://schema.org in our markup?
The short of it is that schema.org will be moving to https, and you can use https URLs now, but there's no rush to switch.
Regarding protocol-relative URLs… please don't use them as they're a hack. Favor use of absolute or root-relative URLs whenever hyperlinking documents on the Web.
Is it correct to statically set up Microdata’s itemtype attribute with HTTP value [...]?
Either HTTP or HTTPS is fine in your itemtype according to the Schema.org FAQ. Your examples containing HTTP and HTTPS schemes are both correct for pages served with and without TLS.
If you've got a mix of absolute URLs pointing to different schemes it's more likely a person will notice it and wonder why things aren't consistent. So when you update refactor your existing itemtypes.

Xhtml namespace on https site?

Im migrating a site from using http to redirect all requests to https and therefor im making sure external script, images etc are references with just // in the beginning of the url instead of http://
My question is this. Do i also need to change stuff like the xhtml namespaces for the html tag or the doctype declaration url? And if I do need to change this, will they resolve urls starting with //?
Namespaces are identifying strings that happen to use URL syntax. They should not be changed.
The DTD is a tricky one.
In theory, if it was altered with a man-in-the-middle attack, then it could be used to change named entities and insert new content into the document.
In practise however, browsers don't generally parse the DTD so this isn't really a worry. Additionally, W3C DTDs are not served over HTTPS so you can't reference them without copying the files to your own server (and possibly updating internal references). If you want to be really safe, you should do this.
Personally, I'd scrap DTDs and just use (X)HTML 5.

what is the name or term for // in front of resources?

So http://cdnjs.com/ and some of my peers are recommending that we use // in front of the resource. What is this term or technology called? As I understand the purpose is so that http or https is preserved. However when I want to google say yepnope's compatibility with it, what do I call it?
It's a "relative URL" — in this case, a URL with no protocol part (and so it uses the protocol from its parent document), just like /foo.html is a relative URL with no protocol or servername parts (and so uses the protocol and server of its parent document).
The purpose of protocol-relative URLs is that they are portable between http and https documents (and a teensy bit shorter). So if you have:
<link rel="stylesheet" href="//cdnjs.cloudflare.com/etc.css">
...on the page http://example.com, the URL expands to
http://cdnjs.cloudflare.com/etc.css
...but if it's on the page https://example.com, it expands to
https://cdnjs.cloudflare.com/etc.css
...and you don't get the "mixed secure and insecure content" warning from the browser.
One slight downside is if you're doing some quick-and-dirty local testing using files you've opened directly from the file system, their protocol is file:, and so the URL ends up being
file://cdnjs.cloudflare.com/etc.css
...which probably doesn't refer to a valid resource on your computer (and leads to questions on SO).
More on my blog: Skipping the protocol

Resources