Rules for using # sign in a custom URL - url

In an HTTP URL, the hash sign (#) signifies an anchor within a page and may only appear once.
Is this a universal rule for all URLs? If I want to implement a custom URL protocol, can I use the following as a legal URL?
myprotocol://zoo#1/cage#30/lion#11

In your own protocol you may do what ever you please. However if you want common parsers to be able to parse your URL you'll have to follow RFC3986 You may want to take a look at section 3 syntaxe component as for rules for using "#", "?", ":" and "/".

Nothing to stop you implementing your own protocol, but probably not much point in re-inventing the wheel - why not just use http://zoo/?x=1&y=2 (ie the query string!) that's what it's there for :)

Related

How can I show the name of the link without http://, https://, and everything that goes after .com and other similar domains?

In my view I'm displaying the link in a such way:
<%= #casino.play_now_link %>
So, #casino.play_now_link can be like this: https://www.spinstation.com/?page=blockedcountry&content=1 What I need, is to display only this part: www.spinstation.com. I tried gsub('http://', '').gsub('https://', ''), and it works, but how can I remove the part of url name after .com? Thanks in advance.
Don't use regexes at all for this sort of thing, use URI from the standard library:
URI.parse(#casino.play_now_link).hostname
or, for a more robust solution, use Addressable:
Addressable::URI.parse(#casino.play_now_link).hostname
Of course, this assumes that you've properly validated that your play_now_links are valid URIs. If you haven't then you can add validations that use URI or Addressable to do so and either clean up existing play_now_links that aren't valid URIs or wrap the parsing and hostname extraction in a method (which is a good idea anyway) with some error handling.
In a simple way one can use
.split('/')[2]
which is regex based and depends on the '/' in your url.
But as #mu is too short mentioned: URI is better for this.

Can friendly-id gem work with capital letters in url e.g. /users/joe-blogs and /users/Joe-Blogs both work

I like the friendly id gem but one problem i'm seeing is when I type in a url with a capitol letter in it such as /users/Joe-Blogs it cant find the page. Its a little trivial but most sites can handle something like this and will generate the page whether it has a capitol letter or not. Does anyone know a fix for this?
Edit: to clarify this is for when users enter a url manually and put capitals in it just because its a name like author/Joe-Blogs. I've seen other sites handle this but rails seems to just give a 404.
friendly_id uses parameterize to create the slugs.
I think the best way to solve your problem is to parameterize the params before using it to find.
# controller
User.find(params[:id].parameterize)
Or parameterize the url where the link originated from.
As an addition to Vic's answer, you'll want to look at url normalization:
The following normalizations are described in RFC 3986 to result in equivalent URLs:
Converting the scheme and host to lower case.
The scheme and host components of the URL are case-insensitive. Most normalizers will convert them to lowercase.
Example: HTTP://www.Example.com/ → http://www.example.com/
In short - it's against convention to use capitalization in your urls.
You may also wish to look at URI normalize; more importantly, you should work to remove the capitalization from your URLs:
URI.parse(params[:id]).normalize

Allow plus sign in URL with MVC 3

I need to be able to allow the "+" sign for certain actions in a controller. I am building a tag filtering engine that allows something like this (ie. stackoverflow) : /Stuff/Tagged/tag-name-1+tag-name-2+other-tag
I know I can set allowDoubleEscaping="true" in the web.config, but it is not best practices for security reasons.
I am guessing there is a way using maybe a custom filer or some other registry in the global.asax?
StackOverflow is probably treating the + as a whitespace. Most likely they map the route /Stuff/Tagged/{*tags} and call string.split() on the tags. This actually works out great if you don't allow whitespace in your tags.
+ means whitespace in an url. You should URL encode them:
/Stuff/Tagged/tag-name-1%2Btag-name-2%2Bother-tag
You can use simple replace:
string url = Url.Action("Index", "YourController");
url = url.Replace("%2b", "+");

How do SO URLs self correct themselves if they are mistyped?

If an extra character (like a period, comma or a bracket or even alphabets) gets accidentally added to URL on the stackoverflow.com domain, a 404 error page is not thrown. Instead, URLs self correct themselves & the user is led to the relevant webpage.
For instance, the extra 4 letters I added to the end of a valid SO URL to demonstrate this would be automatically removed when you access the below URL -
https://stackoverflow.com/questions/194812/list-of-freely-available-programming-booksasdf
I guess this has something to do with ASP.NET MVC Routing. How is this feature implemented?
Well, this is quite simple to explain I guess, even without knowing the code behind it:
The text is just candy for search engines and people reading the URL:
This URL will work as well, with the complete text removed!
The only part really important is the question ID that's also embedded in the "path".
This is because EVERYTHING after http://stackoverflow.com/questions/194812 is ignored. It is just there to make the link, if posted somewhere, if more speaking.
Internally the URL is mapped to a handler, e.g., by a rewrite, that transforms into something like: http://stackoverflow.com/questions.php?id=194812 (just an example, don't know the correct internal URL)
This also makes the URL search engine friendly, besides being more readable to humans.

ASP.NET MVC Colon in URL

I've seen that IIS has a problem with letting colons into URLs. I also saw the suggestions others offered here.
With the site I'm working on, I want to be able to pass titles of movies, books, etc., into my URL, colon included, like this:
mysite.com/Movie/Bob:The Return
This would be consumed by my MovieController, for example, as a string and used further down the line.
I realize that a colon is not ideal. Does anyone have any other suggestions? As poor as it currently is, I'm doing a find-and-replace from all colons (:) to another character, then a backwards replace when I want to consume it on the Controller end.
I resolved this issue by adding this to my web.config:
<httpRuntime requestPathInvalidCharacters=""/>
This must be within the system.web section.
The default is:
<httpRuntime requestPathInvalidCharacters="<,>,*,%,&,:,\,?"/>
So to only make an exception for the colon it would become
<httpRuntime requestPathInvalidCharacters="<,>,*,%,&,\,?"/>
Read more at: http://msdn.microsoft.com/en-us/library/system.web.configuration.httpruntimesection.requestpathinvalidcharacters.aspx
For what I understand the colon character is acceptable as an unencoded character in an URL. I don't know why they added it to the default of the requestPathInvalidCharacters.
Consider URL encoding and decoding your movie titles.
You'd end up with foo.com/bar/Bob%58The%20Return
As an alternative, consider leveraging an HTML helper to remove URL unfriendly characters in URLs (method is URLFriendly()). The SEO benefits between a colon and a placeholder (e.g. a dash) would likely be negligable.
One of the biggest worries with your approach is that the movie name isn't always going to be unique (e.g. "The Italian Job"). Also what about other ilegal characters (e.g. brackets etc).
It might be a good idea to use an id number in the url to locate the movie in your database. You could still include a url friendly copy of movie name in your url, but you wouldn't need to worry about getting back to the original title with all the illegal characters in it.
A good example is the url to this page. You can see that removing the title of the page still works:
ASP.NET MVC Colon in URL
ASP.NET MVC Colon in URL
Colon is a reserved and invalid character in an URI according to the RFC 3986. So don't do something that violates the specification. You need to either URL encode it or use another character. And here's a nice blog post you might take a look at.
The simplest way is to use System.Web.HttpUtility.UrlEncode() when building the url
and System.Web.HttpUtility.UrlDecode when interpreting the results coming back. You would also have problems with the space character if you don't encode the value first.

Resources