Convert user title (text) to URL, what instead spaces, #, & and other characters? - url

I have some form on the website where users can add new pages. I must generate SEO friendly URLs and make this URLs unique.
What characters can I display in URL, I know that spaces I should convert to underscore:
" "->"_" and before it - underscores to something else, for example:
"_"->/underscore
It is easy make title from URL back.
But in my specific title can be all characters from keyboard, even : ##%:"{/\';.>
Are some contraindications to don't use this characters in URL?
Important is:
-easy generating URL and title from URL back (without queries to database)
-each title are unique, so URL must be too
-SEO friendly URLs

Aren't you querying the database to get the content anyway? In which case just grab the title field in the same query.
The only way to reliably get the title back from the URL is to 'URL encode' it (in PHP you use the urlencode() function). However, you will end up with URLs like this:
My%20page%20title
You can't replace any characters because you will then not have unique URLs. If you are replacing spaces with underscores, for example, the following titles will all produce the same URL:
My page title
My_page title
My_page_title
In short: don't worry about one extra database hit and just use SEO-friendly URLs by limiting to lowercase a-z, 0-9 and dashes, like my-page-title. Like I said, you can just grab everything in one query anyway.

Related

How to rewrite URLs split by hyphens?

I am getting confused while writing URLs with hyphens. It is conflicting with GET parameters.
For instance, I have a long book name in URL, with spaces replaced by hyphens, like the-famous-world-records-of-athletics. After this I am getting error in pagination also separated with hyphens.
Please suggest how I can write URLs in given stage:
example.com/vc.php?book=the-famous-world-records-of-athletics
example.com/vc.php?book=the-famous-world-records-of-athletics&page=1
example.com/vc.php?book=the-famous-world-records-of-athleticstopic=jumping-and-racing&page=2
Wishing to write as:
example.com/the-famous-world-records-of-athletics.html
example.com/the-famous-world-records-of-athletics-1.html
example.com/the-famous-world-records-of-athletics-jumping-and-racing-2.html
A minus is perfectly valid in an URL, it is a so-called 'unreserved' character.
https://en.wikipedia.org/wiki/Percent-encoding
If you really need to replace them, I'd replace them with %2D, just like you would replace a space with %20.

safe character to separate multiple urls

I am preparing a special string, in which keys are values are concatenated like below:
username=foo&age=24&email=foo#bar.com&homepage=http://foo.com
& is the separator for two key=value pairs
value is url encoded
I have a scenario where there are multiple home pages for a user.
I want to specify multiple urls for the homepage key
name=foo&age=24&email=foo#bar.com&homepage=url1<some_safe_url_separator_char>url2<some_safe_url_separator_char>url3
We have no control/idea over what url1, url2, .. may contain?
What is a good choice of some_safe_url_separator_char?
In other words I am not looking for a safe character to be used IN a url, but a safe character to be used to SEPARATE two urls in a string
well you can use URL re-writing for this .
It will make a URL that will be safe as it will hide the name of parameters
For refrence you can use URL rewriting
URL rewriting will make a url seprated by '/' and its tough to be decoded by an external person.
you can follow links i'm posting
URL rewriting for beginners

Prevent spaces in URL showed like %20

I would like to display nice URL in address bar and avoid spaces to display like %20
Here is an example:
Is it possible to replace these spaces with a dash ?
Something like: /BANQUE/International-Ledger
Maybe something to do in the routing ?
Thanks.
You dont want to replace spaces in generated route, you want to not generate them in the first place.
What is your "International Ledger" ? If it is action, then use [ActionName("International-Ledger")]
If it is some kind of product or category of product, is it good practice dont use product name for URL, but some "token" generated from name, for example with regex, replacing spaces with dashes, special letters with theirs basic alphabet variants, and maybe some unique identifier to prevent conflicts of products with the same name.
see How can I create a SEO friendly dash-delimited url from a string?

Use non-lating characters for product and category URL key in magento

Magento converts non-Latin characters in the URL key of products and categories to Latin characters. How can I use non-Latin characters?
formatUrlKey in Mage/Catalog/Model/Product/Url.php uses $_convertTable in Mage/Catalog/Helper/Product/Url.php. I've tried to change the code but I can't make Magento save non-Latin URLs and show them correctly in the admin.
I've removed hebrew letters from the $_convertTable as you suggsted.
The problem is that the formatUrlKey replaces characters which are not 0-9 or a-z with '-':
public function formatUrlKey($str)
{
$urlKey = preg_replace('#[^0-9a-z]+#i', '-', Mage::helper('catalog/product_url')->format($str));
$urlKey = strtolower($urlKey);
$urlKey = trim($urlKey, '-');
return $urlKey;
}
So I'm overriding this method and changing it to:
$urlKey = preg_replace('#[^0-9a-zא-ת]+#i', '-', Mage::helper('url')->format($str));
Now magento correctly saves and display the url string but it doesn't work in the browser.
When trying to access the product url I'm getting 404.
If instead of preg_replace, strtolower and trim I'm using only:
$urlKey = urlencode($str);
It also doesn't work because magento calls formatUrlKey several times.
I don't understand why.
Thanks
Hi
This extension will help for you.
http://www.magentocommerce.com/magento-connect/alexhost/extension/6587/magefast_seflinkmultilanguage
Since Magento is just blinding converting from the table, deleting entries from the table will prevent Magento from trying to convert them. Override the helper class and delete the entries you don't want to see and you should be pretty well on your way.
As far as displaying them correctly in the admin panel, is this a separate problem you have if you save those non-Latin characters? More specific information would be helpful.

Why is this query string invalid?

In my asp.net mvc page I create a link that renders as followed:
http://localhost:3035/Formula/OverView?colorId=349405&paintCode=744&name=BRILLANT%20SILVER&formulaId=570230
According to the W3C validator, this is not correct and it errors after the first ampersand. It complains about the & not being encoded and the entity &p not recognised etc.
AFAIK the & shouldn't be encoded because it is a separator for the key value pair.
For those who care: I send these pars as querystring and not as "/" seperated values because there is no decent way of passing on optional parameters that I know of.
To put all the bits together:
an anchor (<a>) tag's href attribute needs an encoded value
& encodes to &
to encode an '&' when it is part of your parameter's value, use %26
Wouldn't encoding the ampersand into & make it part of my parameter's value?
I need it to seperate the second variable from the first
Indeed, by encoding my href value, I do get rid of the errors. What I'm wondering now however is what to do if for example my colorId would be "123&456", where the ampersand is part of the value.
Since the separator has to be encoded, what to do with encoded ampersands. Do they need to be encoded twice so to speak?
So to get the url:
www.mySite.com/search?query=123&456&page=1
What should my href value be?
Also, I think I'm about the first person in the world to care about this.. go check the www and count the pages that get their query string validated in the W3C validator..
Entities which are part of the attributes should be encoded, generally. Thus you need & instead of just &
It works even if it doesn't validate because most browsers are very, very, very lenient in what to accept.
In addition, if you are outputting XHTML you have to encode every entity everywhere, not just inside the attributes.
All HTML attributes need to use character entities. You only don't need to change & into & within script blocks.
Whatever
Anywhere in an HTML document that you want an & to display directly next to something other than whitespace, you need to use the character entity &. If it is part of an attribute, the & will work as though it was an &. If the document is XHTML, you need to use character entities everywhere, even if you don't have something immediately next to the &. You can also use other character entities as part of attributes to treat them as though they were the actual characters.
If you want to use an ampersand as part of a URL in a way other than as a separator for parameters, you should use %26.
As an example...
Hello
Would send the user to http://localhost/Hello, with name=Bob and text=you & me "forever".
This is a slightly confusing concept to some people, I've found. When you put & in a HTML page, such as in <a href="abc?def=5&ghi=10">, the URL is actually abc?def=5&ghi=10. The HTML parser converts the entity to an ampersand.
Think of exactly the same as how you need to escape quotes in a string:
// though you define your string like this:
myString = "this is \"something\" you know?"
// the string is ACTUALLY: this is "something" you know?
// when you look at the HTML, you see:
<a href="foo?bar=1&baz=2">
// but the url is ACTUALLY: foo?bar=1&bar=2

Resources