safe character to separate multiple urls - url

I am preparing a special string, in which keys are values are concatenated like below:
username=foo&age=24&email=foo#bar.com&homepage=http://foo.com
& is the separator for two key=value pairs
value is url encoded
I have a scenario where there are multiple home pages for a user.
I want to specify multiple urls for the homepage key
name=foo&age=24&email=foo#bar.com&homepage=url1<some_safe_url_separator_char>url2<some_safe_url_separator_char>url3
We have no control/idea over what url1, url2, .. may contain?
What is a good choice of some_safe_url_separator_char?
In other words I am not looking for a safe character to be used IN a url, but a safe character to be used to SEPARATE two urls in a string

well you can use URL re-writing for this .
It will make a URL that will be safe as it will hide the name of parameters
For refrence you can use URL rewriting
URL rewriting will make a url seprated by '/' and its tough to be decoded by an external person.
you can follow links i'm posting
URL rewriting for beginners

Related

How to rewrite URLs split by hyphens?

I am getting confused while writing URLs with hyphens. It is conflicting with GET parameters.
For instance, I have a long book name in URL, with spaces replaced by hyphens, like the-famous-world-records-of-athletics. After this I am getting error in pagination also separated with hyphens.
Please suggest how I can write URLs in given stage:
example.com/vc.php?book=the-famous-world-records-of-athletics
example.com/vc.php?book=the-famous-world-records-of-athletics&page=1
example.com/vc.php?book=the-famous-world-records-of-athleticstopic=jumping-and-racing&page=2
Wishing to write as:
example.com/the-famous-world-records-of-athletics.html
example.com/the-famous-world-records-of-athletics-1.html
example.com/the-famous-world-records-of-athletics-jumping-and-racing-2.html
A minus is perfectly valid in an URL, it is a so-called 'unreserved' character.
https://en.wikipedia.org/wiki/Percent-encoding
If you really need to replace them, I'd replace them with %2D, just like you would replace a space with %20.

url encode & url escape & url rewrite, what's the differences?

It's kinda confusing to differenciate those three terms.
It'll be more understandable if you can explain with examples.
Url encoding and Url escaping are one and the same..
URL Encoding is a process of transforming user input to a CGI form so it is fit for travel across the network; basically, stripping spaces and special characters present in the url, replacing them with escape characters.
URL rewriting changes the way you normally associate urls with resources. Normally, test.com/aboutus makes us think that it will take us to the about us page. But internally, Server may take user 1 to /aboutus/page1.html, user 2 to /aboutus/page2.html or any other resource. The Url exposed to the end user will be test.com/aboutus but the resource being rendered can be different. Note that Url Rewriting is performed by Server.

Why we don't use such URL formats?

I am reworking on the URL formats of my project. The basic format of our search URLs is this:-
www.projectname/module/search/<search keyword>/<exam filter>/<subject filter>/... other params ...
On searching with no search keyword and exam filter, the URL will be :-
www.projectname/module/search///<subject filter>/... other params ...
My question is why don't we see such URLs with back to back slashes (3 slashes after www.projectname/module/search)? Please note that I am not using .htaccess rewrite rules in my project anymore. This URL works perfect functionally. So, should I use this format?
For more details on why we chose this format, please check my other question:-
Suggest best URL style
Web servers will typically remove multiple slashes before the application gets to see the request,for a mix of compatibility and security reasons. When serving plain files, it is usual to allow any number of slashes between path segments to behave as one slash.
Blank URL path segments are not invalid in URLs but they are typically avoided because relative URLs with blank segments may parse unexpectedly. For example in /module/search, a link to //subject/param is not relative to the file, but a link to the server subject with path /param.
Whether you can see the multiple-slash sequences from the original URL depends on your server and application framework. In CGI, for example (and other gateway standards based on it), the PATH_INFO variable that is typically used to implement routing will usually omit multiple slashes. But on Apache there is a non-standard environment variable REQUEST_URI which gives the original form of the request without having elided slashes or done any %-unescaping like PATH_INFO does. So if you want to allow empty path segments, you can, but it'll cut down on your deployment options.
There are other strings than the empty string that don't make good path segments either. Using an encoded / (%2F), \ (%5C) or null byte (%00) is blocked by default by many servers. So you can't put any old string in a segment; it'll have to be processed to remove some characters (often ‘slug’-ified to remove all but letters and numbers). Whilst you are doing this you may as well replace the empty string with _.
Probably because it's not clearly defined whether or not the extra / should be ignored or not.
For instance: http://news.bbc.co.uk/sport and http://news.bbc.co.uk//////////sport both display the same page in Firefox and Chrome. The server is treating the two urls as the same thing, whereas your server obviously does not.
I'm not sure whether this behaviour is defined somewhere or not, but it does seem to make sense (at least for the BBC website - if I type an extra /, it does what I meant it to do.)

Convert user title (text) to URL, what instead spaces, #, & and other characters?

I have some form on the website where users can add new pages. I must generate SEO friendly URLs and make this URLs unique.
What characters can I display in URL, I know that spaces I should convert to underscore:
" "->"_" and before it - underscores to something else, for example:
"_"->/underscore
It is easy make title from URL back.
But in my specific title can be all characters from keyboard, even : ##%:"{/\';.>
Are some contraindications to don't use this characters in URL?
Important is:
-easy generating URL and title from URL back (without queries to database)
-each title are unique, so URL must be too
-SEO friendly URLs
Aren't you querying the database to get the content anyway? In which case just grab the title field in the same query.
The only way to reliably get the title back from the URL is to 'URL encode' it (in PHP you use the urlencode() function). However, you will end up with URLs like this:
My%20page%20title
You can't replace any characters because you will then not have unique URLs. If you are replacing spaces with underscores, for example, the following titles will all produce the same URL:
My page title
My_page title
My_page_title
In short: don't worry about one extra database hit and just use SEO-friendly URLs by limiting to lowercase a-z, 0-9 and dashes, like my-page-title. Like I said, you can just grab everything in one query anyway.

How in ASP.NET MVC to change Url.Encode character replacement strategy?

I'm using Url.Encode within a view and it's replacing spaces with + so instead of:
/production/cats-the-musical I'm getting .../cats+the+musical.
I'm sure this is an easy one, but where do you go to configuring which characters are used for this?
I'll be doing this:
public static string EncodeForSEO(this UrlHelper helper, string unencodedUrl)
{
return helper.Encode(unencodedUrl.Replace(' ', '-'));
}
Until I get a better answer from you guys.
Edit: Thanks Guffa for pointing out my hasty coding.
I want to draw attention to Path versus Query String encoding differences
MVC allows / encourages us to write paths (routes) that can be easier to remember than query strings. e.g. /Products.aspx?id=1 could, in MVC, be /Products/View/1
Building on that, it also encourages, for SEO friendliness, other data that may or may not be necessary like /Products/View/1/Coffee
If the name has space characters, or a necessary parameter is a string containing space characters, and you are including it in the Url path, one of 2 things must happen because a ' ' cannot be left in a Url Path or Query string parameter without being encoded.
You must UrlPathEncode() the string
first you transform the spaces in the string,
then call UrlPathEncode() as you may have other characters requiring encoding.
Note: there is a big difference between Url Encoding (meant for query strings) and Url Path Encoding (meant for path portions of Urls)
cats the musical -> UrlEncode -> cats+the+musical
-- this is not valid in a url path
cats the musical -> UrlPathEncode -> cats%20the%20musical
If you're following along; going back to Web Forms vs MVC - /Products.aspx?name=Coffee+Beans would be rewritten as /Products/View/Coffee%20Beans
So that leaves us where OP's question starts. Q: How do you get SEO and human Friendly Urls? Q: Use #Guffas code to replace the " " with "-" in your own code before UrlPathEncoding the rest.
In sites I've worked on, when we have a user-entered value used only for SEO (like a blog title or similar) we go a step further normalizing the string output by collapsing successive spaces into a single "-" e.g.
cats the musical which would otherwise be cats-----the-----musical becomes cats-the-musical
You can't change which characters the UrlEncode method uses, the use of "+" for spaces is defined in the standards for how an URL is encoded, using "-" instead would mean that the method would change the value and not just encoding it. As the "-" character is not encoded, there would be no way to decode the string back to the original value.
In your method, there is no need to check for the character before doing the replacement. If the Replace method doesn't find anything to replace, it just returns the original string reference.
public static string EncodeForSEO(this UrlHelper helper, string unencodedUrl) {
return helper.Encode(unencodedUrl.Replace(' ', '-'));
}

Resources