Sitecore issue on replacing danish characters in url - url

I used this solution to optimize urls, all works fine but there are a problem with danish characters (æ and ø) which sould be replaced to "a" and "o". I used this in Web.config:
<replace mode="on" find="æ" replaceWith="a" />
<replace mode="on" find="ø" replaceWith="o" />
Urls looks good, but when I try to go by this link I got 404 error and if I manually change "a" to "æ" in url page opens.
Help me please!:)

Remember that replacement is two way. Generated URLs will subsitute a for æ.
Incoming URLs will replace a with æ when looking up items.
As Danish uses both letters simply replacing æ with a when you generate URLs will cause you all sorts of headaches - e.g. the item at-spise-æbler ("to eat apples") will generate the URL at-spise-abler, which will be reverse replaced during item lookup to try and find the item æt-spise-æbler.
To be more consistent you should replace æ with ae, å with aa and ø with oe if you wish to replace Danish characters.
If you are also using replace mode to ensure all URLs are lower-cased (e.g. <replace mode="on" find="A" replaceWith="a" /> ) then your incoming URL containing an "a" will be interpreted as containing an "A" (assuming replacement is in order of the entries in the web.config and your lowercasing matches are first - if it's the other way round then you still have other problems!). The item at-spise-æbler will still generate a URL at-spise-abler, but your item lookup may match a to A first, trying to find At-spise-Abler, which doesn't exist.
The double letter substitution won't help you here either, as Sitecore will simply match each letter to its uppercase version
A beter solution for you would be to actually rename items (or their display names) when they are created or edited.
This link shouldpoint you in theright direction: http://briancaos.wordpress.com/2007/05/30/sc-53-ensure-item-names/

Related

Does letter casing of directories and urls matter in .NET MVC?

Say I have a TitleCase directory name, but call an item within that directory using a lowercase url.
Does that have any effect or impact?
For example, does the server need to do a redirect from the incorrect lettercase to the correct lettercase?
Example
A file here: /PlugIns/CMSPages/Images/my-image.jpg
Called with: /plugins/cmspages/images/my-image.jpg
The routing engine isn't case sensitive.
One thing to be wary of, if you are referring to page urls - Google treats lowercase and uppercase urls as different pages, so you want to make use of rel="canonical" to ensure Google and other search engines know it is one page, no matter whether the url is upper or lowercase.

How to transform encoded URL to readable texts?

It's about Bangla Unicode texts, but can be a problem for any language other than Latin glyphs.
I'm a host of a Bangla blog with all its texts and categories in Bangla (I prefer not to say Bengali as because the name of the language is Bangla rather than Bengali).
So the category in Bangla "বাংলা" saying a URL like:
http://www.example.com/category/বাংলা
But whenever I copied the URL from address bar and put 'em into a chat panel or somewhere else, it changed with some strange characters, for example:
http://www.example.com/category/%E0%A6%B8%E0%A7%8D%E0%A6%A8%E0*
* it's just an example, not the exact gibberish for the word "বাংলা")
So, in many cases I got some encoded URLs like above, from where I found no trace which Unicode text they are saying. Recently I'm getting some 404 error logged by one of my plugin. From there I found a URI like:
/category/%E0%A6%B8%E0%A7%8D%E0%A6%A8%E0%A6%BE%E0%A7%9F%E0%A7%81%E0%A6%AC%E0%A6%BF%E0%A6%A6%E0%A7%8D%E0%A6%AF%E0
I used the Jetpack's Omnisearch to find out any match, but the result is empty. I can't even trace which category that is— creating such a 404.
So here comes the question:
How can I transform the encoded URL to readable glyphs?
http://www.example.com/category/বাংলা
isn't a URL; URLs can only contain ASCII characters. This is an IRI.
http://www.example.com/category/%E0%A6%AC%E0%A6%BE%E0%A6%82%E0%A6%B2%E0%A6%BE
is the URI representation of that IRI. They are otherwise equivalent. A browser may display the ‘pretty’ IRI version in the user interface, but put the URI version on the clipboard so that you can paste it into other tools that don't support IRI.
The 404 address you pasted translates to:
/category/স্নায়ুবিদ্য�
where the last character is a � because it is an invalid, truncated UTF-8 sequence. (This is probably why the request failed.) Someone may have mis-pasted a partial URI here.
If you're using javascript you can do:
decodeURIComponent(url);
This will make sure the original language is preserved.

ASP.NET URL contains multi "dot" symbol

I wrote the code in global.asax contain this
oRoutes.MapPageRoute("test-route", "home/{cURL}", "~/test.aspx");
everything fine, but had error when URL contains "." symbol. And I add the code below just can fix only one dot in URL.
<httpRuntime relaxedUrlToFileSystemMapping="true" />
For Example, when I call http://foo.com/home/open.door.foo/, the routing failed.
Is there any simple way to fix this problem? thanks.
P.S 1: please don't provide the way to remove last words like ".foo", because there could be occur in my URL like http://foo.com/hey.john.open.the.book.volume.1-brabra :-)
P.S 2: For some reason, I must be use "." symbol in URL. :'(
I guess based on several posts here in SO, you should encode your values
ASP.NET MVC: How to Route Search Term with . (Period) at the end
Semantic urls with dots in .net

ASP.NET MVC Colon in URL

I've seen that IIS has a problem with letting colons into URLs. I also saw the suggestions others offered here.
With the site I'm working on, I want to be able to pass titles of movies, books, etc., into my URL, colon included, like this:
mysite.com/Movie/Bob:The Return
This would be consumed by my MovieController, for example, as a string and used further down the line.
I realize that a colon is not ideal. Does anyone have any other suggestions? As poor as it currently is, I'm doing a find-and-replace from all colons (:) to another character, then a backwards replace when I want to consume it on the Controller end.
I resolved this issue by adding this to my web.config:
<httpRuntime requestPathInvalidCharacters=""/>
This must be within the system.web section.
The default is:
<httpRuntime requestPathInvalidCharacters="<,>,*,%,&,:,\,?"/>
So to only make an exception for the colon it would become
<httpRuntime requestPathInvalidCharacters="<,>,*,%,&,\,?"/>
Read more at: http://msdn.microsoft.com/en-us/library/system.web.configuration.httpruntimesection.requestpathinvalidcharacters.aspx
For what I understand the colon character is acceptable as an unencoded character in an URL. I don't know why they added it to the default of the requestPathInvalidCharacters.
Consider URL encoding and decoding your movie titles.
You'd end up with foo.com/bar/Bob%58The%20Return
As an alternative, consider leveraging an HTML helper to remove URL unfriendly characters in URLs (method is URLFriendly()). The SEO benefits between a colon and a placeholder (e.g. a dash) would likely be negligable.
One of the biggest worries with your approach is that the movie name isn't always going to be unique (e.g. "The Italian Job"). Also what about other ilegal characters (e.g. brackets etc).
It might be a good idea to use an id number in the url to locate the movie in your database. You could still include a url friendly copy of movie name in your url, but you wouldn't need to worry about getting back to the original title with all the illegal characters in it.
A good example is the url to this page. You can see that removing the title of the page still works:
ASP.NET MVC Colon in URL
ASP.NET MVC Colon in URL
Colon is a reserved and invalid character in an URI according to the RFC 3986. So don't do something that violates the specification. You need to either URL encode it or use another character. And here's a nice blog post you might take a look at.
The simplest way is to use System.Web.HttpUtility.UrlEncode() when building the url
and System.Web.HttpUtility.UrlDecode when interpreting the results coming back. You would also have problems with the space character if you don't encode the value first.

Convert user title (text) to URL, what instead spaces, #, & and other characters?

I have some form on the website where users can add new pages. I must generate SEO friendly URLs and make this URLs unique.
What characters can I display in URL, I know that spaces I should convert to underscore:
" "->"_" and before it - underscores to something else, for example:
"_"->/underscore
It is easy make title from URL back.
But in my specific title can be all characters from keyboard, even : ##%:"{/\';.>
Are some contraindications to don't use this characters in URL?
Important is:
-easy generating URL and title from URL back (without queries to database)
-each title are unique, so URL must be too
-SEO friendly URLs
Aren't you querying the database to get the content anyway? In which case just grab the title field in the same query.
The only way to reliably get the title back from the URL is to 'URL encode' it (in PHP you use the urlencode() function). However, you will end up with URLs like this:
My%20page%20title
You can't replace any characters because you will then not have unique URLs. If you are replacing spaces with underscores, for example, the following titles will all produce the same URL:
My page title
My_page title
My_page_title
In short: don't worry about one extra database hit and just use SEO-friendly URLs by limiting to lowercase a-z, 0-9 and dashes, like my-page-title. Like I said, you can just grab everything in one query anyway.

Resources