How to implement Schema.org on HTTPS pages? - url

Is it correct to statically set up Microdata’s itemtype attribute with HTTP value (http://schema.org/WebPage) on HTTPS pages or do I need to use HTTPS value (https://schema.org/WebPage) on all pages?
Since both HTTP and HTTPS versions of the site are available, can I set it up to //schema.org/WebPage or not?

tl;dr: Use http URIs.
In this answer on Webmasters SE I explained why you should favor http over https Schema.org URIs: The http URIs seem to be canonical, as the actual definition of the Schema.org vocabulary only defines http, not https. In addition: all examples (even on HTTPS) use the HTTP variant, the authors mentioned that they prefer to see the use of the HTTP variant, and RDFa’s Initial Context defines the HTTP variant only (so most of the RDF world will use HTTP).
In this answer on Webmasters SE I explained why you should not use protocol-relative URIs for vocabularies: Vocabulary URIs typically don’t get dereferenced, and there will never get something embedded from a vocabulary, so there is absolutely no need to use HTTPS for these just because you use HTTPS (it’s similar to simply linking to an external page, which might not even be accessible via HTTPS). On top of that, your Schema.org markup would no longer work if the document is accessed via a different protocol than HTTP/HTTPS, and it’s likely that some parsers won’t be able to recognize that you are using the Schema.org vocabulary because they might look for full URIs without applying URI resolution for the itemtype attribute.

There's been an update to that answer on Webmasters SE (dated November 2015), with a link to the schema.org FAQ about https:
Q: Should we write https://schema.org or http://schema.org in our markup?
The short of it is that schema.org will be moving to https, and you can use https URLs now, but there's no rush to switch.

Regarding protocol-relative URLs… please don't use them as they're a hack. Favor use of absolute or root-relative URLs whenever hyperlinking documents on the Web.
Is it correct to statically set up Microdata’s itemtype attribute with HTTP value [...]?
Either HTTP or HTTPS is fine in your itemtype according to the Schema.org FAQ. Your examples containing HTTP and HTTPS schemes are both correct for pages served with and without TLS.
If you've got a mix of absolute URLs pointing to different schemes it's more likely a person will notice it and wonder why things aren't consistent. So when you update refactor your existing itemtypes.

Related

Is a protocol (eg. http or https) required for a URL to be valid?

Recently I came across a lot of code from analytics plugins where they specify the URL as //fonts.googleapis.com or //www.google.com.
Basically it starts with two forward slashes and then the domain or subdomain. These links work fine in browsers. I have read the following documents, but I am still not sure if above can be called valid URLs (basically should these be reported as broken URLs or not).
https://developer.mozilla.org/en-US/docs/Web/API/URL and
https://url.spec.whatwg.org/
Is there a standard specification that I can refer to?
They're both valid scheme-relative-URL strings, although they need to be in the context of a Base URL to be meaningful. When used within a web page, the web page will provide the Base URL context.
Although there are other, earlier standards for URLs, the whatwg document represents the most up-to-date, web compatible definition.

How can I use at symbol (#) in url?

For example I meet this url type: http://username:token#example.com/protected/files .
I searched on the web for this but I don't find what I expected.
Wikipedia explains the syntax of a URI quite well:
scheme:[//[user[:password]#]host[:port]][/path][?query][#fragment]
Your username:token#example.com/protected/files may look like and URL, but if fact it is not, because it does not include the protocol to access the data. It is an uri.
Browsers (I suppose you refer to web browsers) do work with URL's, which is a subtype of URI that includes the protocol. Please notice web browsers dont work with all existing protocols, only with some of them (http, https, ftp, file...) username is not a protocol.

What are the differences between implementing HTTPS everywhere via IIS or MVC?

I'm working on a project to require HTTPS everywhere among a suite of MVC and WebAPI applications. I'm trying to understand the trade-offs between clicking the "Require SSL" checkbox in IIS & using a URL Rewrite zmodule vs. using a RequireHttpsAttribute in my global filters and modifying my web.config.
I've found the following guides detailing each approach:
https://webmasters.stackexchange.com/questions/28057/iis-7-require-ssl-automatically-redirect-to-https
http://tech.trailmax.info/2014/02/implemnting-https-everywhere-in-asp-net-mvc-application/
Explain the mechanism can be lengthy, so I will just list the most significant differences in behaviour:
do "Require SSL" in IIS:
The context basically expalin what it do, it's "Require" not "Enforce", which means, if people trying to access your website content through http, the server will just respond with a 403 error, which is usually not a desired behavior, but this may help some certain situation
using URL rewrite module:
The module itself can do quite some different thing, but I assume you are just going to do the regular https redirect. Which means, if user trying to hit ANY content of the site through http, the server will do a 301 or 302 redirect to the https version of same url. This is usually a good option since it doesn't affect any usability of the website.
Global RequireHttpsAttribute action filter: This do similar thing to option number 2, it will do a 302 redirect for any http request that is hitting an ACTION. The main difference is that this only applies to all actions in your controllers, Which means, if someone trying to just get a image or css file through http on your website, this option will let it through and not do any enforcement. This leave you the capability to serve static contents through http, which can be useful in some specific circumstances
Just one extra thing worth mention, the 301 and 302 redirect is not going too well with http POST, so if your user trying to do a post through http, the request body will get lost (thanks to the info from #ChrisPratt).
Typically the folks managing the infrastructure are responsible for making sure things are on https. Typically they aren't very good at this so that is where the RequireHttpsAttribute kicks in as it can encforce https requests at a code level thereby enforcing the HTTPS-only attribute.
In practice it isn't so great as many production setups -- including stackoverflow.com's -- see https terminated in an edge device before being unwrapped and handed to the back-end apps as http and the require https attribute isn't quite nuanced enough to understand this distinction.
The best bet in general is to configure the edge device providing the public http interface to take HTTPS and only HTTPS. Then setup secondary virtual sites [or whatever is vendor appropriate] to redirect all traffic to the cannonical HTTPS url. I'd be a bit nervous about relying upon the RequireHttpsAttribute unless it will be a small app handling it's own requests. That still leaves open holes in terms of artifacts and other things that might not be coming off of a controller.

what is the name or term for // in front of resources?

So http://cdnjs.com/ and some of my peers are recommending that we use // in front of the resource. What is this term or technology called? As I understand the purpose is so that http or https is preserved. However when I want to google say yepnope's compatibility with it, what do I call it?
It's a "relative URL" — in this case, a URL with no protocol part (and so it uses the protocol from its parent document), just like /foo.html is a relative URL with no protocol or servername parts (and so uses the protocol and server of its parent document).
The purpose of protocol-relative URLs is that they are portable between http and https documents (and a teensy bit shorter). So if you have:
<link rel="stylesheet" href="//cdnjs.cloudflare.com/etc.css">
...on the page http://example.com, the URL expands to
http://cdnjs.cloudflare.com/etc.css
...but if it's on the page https://example.com, it expands to
https://cdnjs.cloudflare.com/etc.css
...and you don't get the "mixed secure and insecure content" warning from the browser.
One slight downside is if you're doing some quick-and-dirty local testing using files you've opened directly from the file system, their protocol is file:, and so the URL ends up being
file://cdnjs.cloudflare.com/etc.css
...which probably doesn't refer to a valid resource on your computer (and leads to questions on SO).
More on my blog: Skipping the protocol

Why can protocol be omitted from absolute paths on a webpage?

I recently ran across a website that had some interesting styling on a select element. I went to investigate and found this (names changed to protect the innocent):
<script type="text/javascript" src="//www.domain.tld/file.js"></script>
It works despite HTTP: being omitted. What is the purpose of leaving off the protocol?
It will use the protocol you're already using. Useful for sites with both https and http versions.
So if you're on https://www.domain.tld/file.js the script will be https://www.domain.tld/file.js.
If you're on http://www.domain.tld/ the script will be http://www.domain.tld/file.js.
i believe this is short hand for a relative path to the protocol. So it should use the same protocol as is being used for that session. e.g if you grabbed that page with http, then this url is relative to http protocol
The purpose is that the scheme (ie. http or https) can be determined relative to the containing page. This is useful if you have a common piece of code included in multiple pages that can be served via http or https.
The purpose is to "use the same protocol as in the current URL" -- presumably (?) useful if the page can be reached both as http: and https: (I have a hard time thinking of other protocols yet that it might be useful for, and even this one is not a clear-cut use case).

Resources