Use non-lating characters for product and category URL key in magento - url

Magento converts non-Latin characters in the URL key of products and categories to Latin characters. How can I use non-Latin characters?
formatUrlKey in Mage/Catalog/Model/Product/Url.php uses $_convertTable in Mage/Catalog/Helper/Product/Url.php. I've tried to change the code but I can't make Magento save non-Latin URLs and show them correctly in the admin.
I've removed hebrew letters from the $_convertTable as you suggsted.
The problem is that the formatUrlKey replaces characters which are not 0-9 or a-z with '-':
public function formatUrlKey($str)
{
$urlKey = preg_replace('#[^0-9a-z]+#i', '-', Mage::helper('catalog/product_url')->format($str));
$urlKey = strtolower($urlKey);
$urlKey = trim($urlKey, '-');
return $urlKey;
}
So I'm overriding this method and changing it to:
$urlKey = preg_replace('#[^0-9a-zא-ת]+#i', '-', Mage::helper('url')->format($str));
Now magento correctly saves and display the url string but it doesn't work in the browser.
When trying to access the product url I'm getting 404.
If instead of preg_replace, strtolower and trim I'm using only:
$urlKey = urlencode($str);
It also doesn't work because magento calls formatUrlKey several times.
I don't understand why.
Thanks

Hi
This extension will help for you.
http://www.magentocommerce.com/magento-connect/alexhost/extension/6587/magefast_seflinkmultilanguage

Since Magento is just blinding converting from the table, deleting entries from the table will prevent Magento from trying to convert them. Override the helper class and delete the entries you don't want to see and you should be pretty well on your way.
As far as displaying them correctly in the admin panel, is this a separate problem you have if you save those non-Latin characters? More specific information would be helpful.

Related

Prevent Ruby from changing & to &?

I need to display some superscript and subscript characters in my webpage title. I have a helper method that recognizes the pattern for a subscript or superscript, and converts it to &sub2; or ²
However, when it shows up in the rendered page's file, it shows up in the source code as:
&sub2;
Which is not right. I have it set up to be:
<% provide(:title, raw(format_title(#hash[:page_title]))) %>
But the raw is not working. Any help is appreciated.
Method:
def format_title(name)
label = name
if label.match /(_[\d]+_)+|(~[\d]+~)+/
label = label.gsub(/(_([\d]+)_)+/, '&sub\2;')
label = label.gsub(/(~([\d]+)~)+/, '&sup\2;')
label.html_safe
else
name
end
end
I have even tried:
str.gsub(/&/, '&')
but it gives back:
&amp;sub2;
You can also achieve this with Rails I18n.
<%= t(:page_title_html, scope: [:title]) %>
And in your respective locale file. (title.en.yml most probably):
title:
page_title: "Title with ²"
Here is a chart for HTML symbols regarding subscript and superscripts.
For more information check Preventing HTML character entities in locale files from getting munged by Rails3 xss protection
Update:
In case you need to load the page titles dynamically, first, you'll have to install a gem like Page Title Helper.
You can follow the guide in the gem documentation.
There are two of issues with your example, one is of matter and the other is just a coincidence.
The first issue is you are trying to use character entities that do not actually exist. Specifically, there are only ¹, ² and ³ which provide 1, 2 and 3 superscript symbols respectively. There is no such character entity as &sup4; nor any other superscript digits. There are though bare codepoints for other digits which you can use but this would require a more involved code.
More importantly, there are no subscript character entities at all in HTML5 character entities list. All subscript digits are bare codepoints. Therefore it makes no sense to replace anything with &sub2; or any other "subscript" digit.
The reason you didn't see your example working is due to the test string you chose. Supplying anything with underscores, like _2_mystring will be properly replaced with &sub2;. As &sub2; character entity is non-existent, the string will appear as is, creating an impression that raw method somehow doesn't work.
Try to use ~2~mystring and it will be replaced with the superscript character entity ² and will be rendered correctly. This illustrates that your code correct, but the basic assumption about character entities is not.

Auto detect language and display the correct one with javascript

I am making a website for my friend
https://photos4humanity.herokuapp.com/
I'm thinking to pull the post from its facebook page and display it on the website so he doesnt have to duplicate content for both.
Each facebook post has both english and chinese in it. like here :
https://www.facebook.com/photosforhumanity/
I would like to auto detect the language from the json file I get from facebook. Then detect which is in English and which is in Chinese then only display the right language according to internatioanlize from rails.
Is there a smart way to do this?
You could use Regex to detect if the string has any English characters or not:
isEnglish = myString.match(/[a-zA-Z]/)
or
isEnglish = myString =~ /[a-zA-Z]/
I haven't tested either of these and I don't know how your json file is organized, but this should work for a singular string.
Edit:
To pull the English characters out of the string, you can use the slice! method:
englishString = myString.slice!(/[a-zA-Z]/)
After doing that, myString should only contain non-English characters and englishString should contain only English characters.

How To Avoid The Firewll error due to an Apostrophe (’) in a URL generated

I'm working on QlikView and there is requirement that a URL would be generated using the filters that I have selected in QV and then it would be parse to another application. The url contains the values that I have selected in QV. We are facing an issue due to the and apostrophe (') in the user name.
Below is url generated. You could see, the generated url is not completely treated as a URL due to the apostrophe.
http://abcworld.com/berlin/cgi-bin/berlinisapi.dll?b_action=berlinViewer&ui.action=run.prompt=false&p_Type=E&p_Tra=Paul O'Donnell&p_AdjType=O&p_UD=2014-08-11
How to overcome this issue? Is there any special character that I could replace it with?
Thanks in advance.
You would want to encode the apostrophe as %27. This is called URL Encoding and is useful when you need to insert characters in a URI that can't normally be represented in a URI, or otherwise have special meaning, like a question mark. Spaces are often encoded as %20. So your final URL might be:
http://abcworld.com/berlin/cgi-bin/berlinisapi.dll?b_action=berlinViewer&ui.action=run.prompt=false&p_Type=E&p_Tra=Paul%20O%27Donnell&p_AdjType=O&p_UD=2014-08-11

Best format for adding a version id into a URL path

I'm currently re-working an application and want to add in a version number to the application URL paths. For example:
http://mydomain/app/VERSION-ID/resource/...
My question is, what is the correct or standard format to add a version id to a URL string? Is there any disadvantage to just having it numeric (1.1 or 1-1):
Example: https://api.twitter.com/1.1/account/verify_credentials.json
Or is it better to have a non numeric identifier to be more intuitive as the url is public facing?
Thanks.
Do not use dots in a URL unless you're defining domain spaces. Use either dashes or other truncated versions (that don't use disallowed characters in the URL).
EXAMPLE:
Example: https://api.twitter.com/v1-1/account/verify_credentials.json
UPDATE: Here is some more information in another thread. My preference is not to use dots if at all possible, but it is apparently OK to do.
Can urls contain dots in the path part?

Convert user title (text) to URL, what instead spaces, #, & and other characters?

I have some form on the website where users can add new pages. I must generate SEO friendly URLs and make this URLs unique.
What characters can I display in URL, I know that spaces I should convert to underscore:
" "->"_" and before it - underscores to something else, for example:
"_"->/underscore
It is easy make title from URL back.
But in my specific title can be all characters from keyboard, even : ##%:"{/\';.>
Are some contraindications to don't use this characters in URL?
Important is:
-easy generating URL and title from URL back (without queries to database)
-each title are unique, so URL must be too
-SEO friendly URLs
Aren't you querying the database to get the content anyway? In which case just grab the title field in the same query.
The only way to reliably get the title back from the URL is to 'URL encode' it (in PHP you use the urlencode() function). However, you will end up with URLs like this:
My%20page%20title
You can't replace any characters because you will then not have unique URLs. If you are replacing spaces with underscores, for example, the following titles will all produce the same URL:
My page title
My_page title
My_page_title
In short: don't worry about one extra database hit and just use SEO-friendly URLs by limiting to lowercase a-z, 0-9 and dashes, like my-page-title. Like I said, you can just grab everything in one query anyway.

Resources