Is '|' a recommended separator for semantic URLs? - url

After researching Google and SO, there seems to be conflicting opinions on this.
We have run-in to a problem with Google Chrome substituting | separator as %7C, whereas Firefox and Safari do not.
Here's an example:
http://www.example.com/page1|sub-page2|sub-page-3
Are there any strict rules to follow when choosing a separator character for semantic URLs and are there any strong arguments against (or workarounds when) using |?

| is not a valid character in a URL. Modern browsers will silently encode it to %7C when sending, and may or may not display this change in the address bar. Similarly, servers will silently decode the character for you.
This would have been a problem in last millennium, where browsers would crash just because you didn't specify http://, but today you can just use whatever you want and the browser will take care of it. However, automatic parsers such as http://example.com/test|fish Markdown may not agree to it being a valid URL. In this case, it looks like it does, but try that on my forums and it will complain at you.

Internet explorer/chrome use url encoding when displaying the url in the address bar after a page request has been made, %7C is the safe way of displaying a pipe ('|')
so its not a problem that chrome is doing this.
as a cheeky fix to make all browsers behave the same way, why not use %7C as your separator from the get-go, instead of a pipe, and then all browsers should interpret this as a pipe for you behind the scenes, but display it as &7C in the address bar.

Related

Emojis in domain names - strange behaviour in iOS chrome

I was fooling around on my phone and decided to try putting an emoji in the url bar of google chrome. I entered in 😀.com, the emoji which is equivalent to unicode U+1F600. Chrome ended up evaluating that as http://xn--e28h.com/, which took me to a "webpage unavailable" screen (ERR_NAME_NOT_RESOLVED). I looked up xn--e28h on godaddy and it was unavailable.
Here are my questions:
Why did 😀 turn into xn--e28h? I don't see any relation with the unicode.
Why are domains of this format unavailable on godaddy?
Bonus question: why can't we put emojis in domain names?
DNS uses a special way to encode Unicode into ASCII. The xn-- prefix says that it's an encoded name, and since the whole name in this case is one Unicode codepoint the rest just looks incomprehensible. You can start reading more about this here.
Most (if not all) top-level domains have rules on which Unicode characters they allow for names in that TLD. For example, .SE only allows those characters that are used in one of the official languages of Sweden. This is entirely a policy thing, so the "why" gets fuzzy.
See 2.

Browser Support for UTF8 Encoded Characters in URL's

If I navigate to the following URL with a special UTF8 encoded character I get different results in web browsers:
http://example.com/lörickÚ
Firefox 37 - Shows the correct URL as above.
Chrome 42 - Shows the correct URL as above.
Edge - Shows the correct URL as above.
IE 11 - Shows percent encoded URL http://example.com/l%c3%b6rick%c3%a8/
Where can I find a list of browsers and versions that support this feature and are there any announcements of whether the new Microsoft Edge browser supports this.
This StackOverflow post highlights the above issue for those interested.
What is shown in browser address bars is not necessarily what is used internally.
If you enter http://example.com/lörickĂš in Firefox, it gets shown like that, but it actually gets percent-encoded and becomes http://example.com/l%C3%B6rick%C3%A8. This is for usability reasons (or, if IRIs are not supported, like in HTTP/1.1, for transforming an IRI into a URI), so users don’t necessarily have to enter the correct URL (with percent-encoding), and don’t get confused by seeing these cryptic parts.
You can easily check what really gets used by copy-pasting the URL from the address bar into a text document.
So the three browsers from your example probably use the same URI (i.e., percent-encoded), but two browsers decided to display the un-encoded variant instead.

Keep spaces in URLs without encoding them

As Stack Overflow seems to be unable to create links from URLs that have spaces in them, copy and paste this URL into your browser.
http://grooveshark.com/#!/search/song?q=we will rock you
It does not redirect you to ...song?q=we%20will%20rock%20you or anything like that.
The spaces just simply stay there. When I first saw this, it looked so foreign to me. How is this achieved?
I believe they use javascript to set the contents of the url bar. You can use something like Live HTTP Headers to confirm that the browser definitely sends a request with %20 encoded spaces.
It’s a browser setting. The browser decodes the URL, to make it more readable for humans.
If you copy the URL from the browser’s address bar and paste it into a text document, you’ll see that the space characters are percent-encoded.
See How can I see how the browser percent-encoded my URL? (which is not visible on address bar)

Why is IIS 7.5 / Coldfusion 9 adding a weird character to URL string?

We have built a "redirect" engine into our product so our customers can add/edit/delete custom redirects without us having to maintain a bunch of rewrite rules on the server.
Some issues are arising in the URLs that get passed into our code. We are pulling these from the CGI.QUERY_STRING property populated by Coldfusion (it picks up on 404's thrown by IIS/Coldfusion, which appends the bad URL as a query string like ?404;http://www.mysite.com:80/nonexistent-file.cfm).
What we see is that some special characters are getting an additional character thrown in there (an Â) character. Take this URL (%A9 is the copyright symbol):
http://www.mysite.com/%A9/
The CGI.QUERY_STRING is reporting this as:
http://www.mysite.com:80/©/
I have no idea where this extra "Â" is coming from. I have a hunch that this is being brought in by IIS, but it could also be with Coldfusion as it has to populate the CGI variable.
Any ideas as to why this is happening and how to fix it? It appears that not all percent-encoded/special characters do this...
EDIT:
I am probably giving up on my exact problem, however, it would be beneficial still to know why either IIS or Coldfusion is throwing in this extra character (especially for certain escape sequences over others).
Wow... that's a tough one. Usually folks design sites to use alphanumeric plus the tilde (~) and dash (=). I'm not even sure if the RFC allows for a copywrite symbol as part of the host header. I'm not positive that it should be allowed in the scheme portion of the URL. This article might shed some light on it for you. As for IIS - you might be able to add a specific rewrite rule that takes care of the issue. Personally I would avoid these characters in the schema part of the URL.

Is there a way to disable email engines from automatically hyperlinking a URL?

One of my clients wants to disable the URL to be shown as a hyperlinked URL, it has to be recognized as plain text, this is what I have tried:
ur<!comments>l
I have also tried to remove the <a></a> tag, as well as remove "http://" of the URL, none of them worked in Outlook. Outlook still recognized it as a hyperlink.
Anybody have any workaround here?
There is a zero-width non-breaking space that I like to use: ï»ż
I place it in strategic places so that the URL does not get recognized as a URL, like so: httpï»ż://wwwï»żdomain.ï»żcom.
This strategy has worked for me across platforms and rendering clients. Its advantages are twofold: 1) it prevents the client from auto-rendering text as a link, and 2) unlike other "non-breaking" zero-width space ascii codes (ie  ), it wraps the entire URL if your URL happens to need it (instead of just the parts after the zero-width space).
Try it out.
Credit belongs to my coworker, actually. Seems to work in all clients that we tested.
www.websitename.<img src="" width="0" height="0">com
An empty image tag with 0 width and 0 height. Insert it between the dot and the following text (in this case "com").
After we tried several things, he somehow suffered from a moment of inspiration/brilliance.
No visible spacing between the characters. Not sure what will happen if you copy/paste the string into a browser directly, though. It served my purpose of not allowing email clients to automatically make it a hyperlink, though.
This one worked for me. It is a combination of Scott's answer and David K. Hess's comment.
Break your url using <span>. However, you need to break it in a way that they are not matched as url when the mail client scans it.
eg: http<span>://</span><span>google.</span>com
You can turn off auto-hyperlinking in general. Here is a tutorial for Outlook 2007:
Turn automatic hyperlinking on or off
I have a similar issue with words like "chequed.com" and "interviewing.com" that are creating a hyperlink in my messages when I do not want it to.
The first step I took was to edit the HTML link tags.. but there weren't any.
After that, I went to the text in the email and added a very small space by using a fount of 8pt (im using an ESP, otherwise I would have gone with 1px)
This may help if you're having the same issue.
My solution for this is
http://...
I contacted Gmail's support and spoke with a department manager for Apple Care. This is expected behavior and cannot be prevented. These hacks no longer work, and if implemented could result in your IP being listed as a phishing operation. You're dancing around security issues here. I would suggest revising your content strategy.
The only thing you can do currently is wrap all email addresses in mailto links and phone numbers in tel links. There are no other options available as of 2017.
I had success with janusoo's solution for years until for some reason it began to introduce line breaks on some clients. I found that I could proceed with ​
www.websitename.​com
You might try using CSS to re-flow the text.
<p>www.example.<span style="float:left">http://</span>com/</p>
If the part with "http://" still gets marked as a URL, try breaking things up in different places.
One other trick would be to replace the periods with some other Unicode character that LOOKS like a period but actually isn't. For example, "⠄" (U-2840) is a Braille single-dot.
Alas (!) I don't have any Microsoft applications I can test this with, but good luck with it. :)
If you use . to replace your '.' in your hyperlinks you'll solve Outlook 2007 Hyperlinking the URL.

Resources