How to check special characters in a string for different countries? - ruby-on-rails

I need to check that the strings coming from my form contain special characters (##$% & *!/), including characters from countries like China, Bulgaria, Russia, Greece and others that are not in the Latin alphabet.
The intention is to find a solution that does not harm the user experience and that brings security to the system.
I found some solutions using reGex, but they don't include characters from these other countries.
Is there any library or solution for this?

Related

How to properly use Arabic (or more generally unicode) characters in the slug part of the URL of a website?

Until now, I had always sticked to lowercase alphanum and hyphen for the slug part of any URL.
I'm currently working on a website that supports both english and Arabic languages, and I used transliteration so far.
However feedbacks from Arabic peoples said that the latin transliteration is "horrible".
After searching the web, I found out that Arabic characters can be used nowadays (and are even recommended for SEO), but I don't know exactly what are the rules and best practices (and I don't even speak or read Arabic).
More specifically, I would like to know:
What is the recommended character length?
What is the recommended number of words?
should I turn spaces into hyphens like it's usually done for latin language?
Anything notable that a non-Arabic speaker should be aware of?

Android localization/translation

I have a keyboard app designed for Serbian language. My keys have labels based in Serbian cyrillic alphabet. My xml strings that are used for those labels are enclosed in <xliff:g></xliff:g> tags, but a certain provider on a certain type of a phone still translates these into a different language. Just in case, I also have my strings in language specific folders, but it still happens. Does anyone know if there is a way I could disable translating of all my strings any other way?
There are providers who can handle technical files translations,i.e. know what to translate in technical files. Also, some are available for you to manage the translations. OneSky is one of these platform and we also provide translation service.
See GIF of how placeholder validation works in OneSky
Disclaimer: I work in OneSky

Generate a localized random NSString

Apologies if this has already been asked, but some google search could not find it.
Does anyone please know of any method to generate a random string in iOS which respects the current language of the device?
The idea is that a quick 'unlock code' can be generated using the function; the trouble is that for languages other than English entering the code using the keypad will not be quick or intuitive, particularly if the user does not have the English keyboard enabled.
One easy option would be to generate your string using just the digits 0-9. Then present the standard number pad.
However, you should verify that the standard number pad actually shows the digits 0-9 for all locales. Good ones to verify would be Arabic, Chinese, and Japanese locales, I don't recall for sure what shows in those cases.

Non-Latin characters in URLs - is it better to encode them or replace with their Latin "counterparts"?

We're implementing a blog for a site which supports six different languages and five of them have non-Latin characters in their alphabets. We are not sure whether we should have them encoded (that is what we're doing at the moment)
Létání s potravinami: Co je dovoleno? becomes l%c3%a9t%c3%a1n%c3%ad-s-potravinami-co-je-dovoleno and the browser displays it as létání-s-potravinami-co-je-dovoleno.
or if we should replace them with their Latin "counterparts" (similar looking letters)
Létání s potravinami: Co je dovoleno? becomes letani-s-potravinami-co-je-dovoleno.
I can't find a definitive answer as to what's better from SEO perspective? Search engine optimization is very important for us. Which approach would you suggest?
Most of the times, search engines deal with latin counterparts good, although sometimes, results for i.e. "létání" and "letani" slightly differ.
So, in terms of SEO, almost no harm is done - once your site has good content, good markup and all that other stuff, your site won't suffer from having latin URLs.
You don't always know what combination of system browser and plugins users use, so make them as easy as possible - all websites use standard latin in URLs, because non-latin symbols can choke anything from server through browser to any plugin that might break user's experience.
And I can't stress this enough; Users before SEO!
"what's better from SEO perspective"
Who's your audience? Americans who think all those extra letters are a mistake?
Or folks who read (and search) for "non-ASCII" letters because those non-ASCII letters are part of their language?
SEO is a bad thing to chase. Complete, correct, consistent and usable is what you what to build first.
well i suggest you to replace them with there latin counterparts because it's user friendly and your website will be accessible on every single computer (as the keyboard changes from computer to another but all of them have latins letters), but for SEO perspective i don't think it's gonna be a problem.
Pawel, first of all, you should decide whether you're going to optimize for global Google (google.com) or Polish one.
In accordance with the URI specification, RFC 3986, only 7bit ASCII characters are allowed, and characters among those mentioned in the specification as control characters must be properly escaped. If you want to represent other characters or URI control characters then you should be using IRI, RFC 3987. Keep in mind that HTTP is not compatible with IRI, however.
When in doubt RTFM.
Another issue is that there are Unicode code points whose glyphs look very much alike in most fonts, which is absolutely ideal for phishers. Stick to ASCII and the glyphs are visibly different when the characters are.

How SEO friendly is Unicode URL?

As the title says, how SEO friendly is a URL containing Unicode characters.
Edit: To clarify, I meant URL with non-ASCII characters but valid Unicode.
If I were a Google other search engines authority I wouldn't consider the unicode URL-s an advantage. I have been using unicode urls for more than two years in my Persian website but believe me I just did it because I felt I was forced to do this. We know Google handles Uncode urls very well but I can't see the unicode words in URL-s when I'm working with them in google webmaster tools here is an example:
http://www.learnfast.ir/%D9%88%D8%A8%D9%84%D8%A7%DA%AF-%D8%A2%DA%AF%D9%87%DB%8C
there are only two Farsi words in such a messy and lengthy URL.
I believe other Unicode url users don't like this either but they do this only for SEO optimization not for categorizing their contents or directing their users to the right address. Of course unicode is excellent for crawling the contents but there should be other ways to index URL-s. more over, English is our international language; Isn't it? It can be beneficially used for URL-s. There should be other means for indexing Unicode Urls. (Sorry for too much words from an amateur webmaster).
All URLs can be represented as Unicode. Unicode just defines a range of code-points from U+0000 to U+10FFFF, which allows you to define any characters.
If what you mean is "How SEO friendly are URLs containing characters above U+007F" then they should be as good as anything else, as long as the word is correct. However, they won't be very easy for most users to type if that's a concern, and may not be supported by all internet browsers/libraries/proxies etc. so I'd tend to steer clear.
FWIW, Amazon (Japan) uses Unicode URL for their product pages.
http://www.amazon.co.jp/任天堂-193706011-Wiiスポーツ-リゾート-「Wiiモーションプラス」1個同梱/dp/B001DLXXCC/ref=pd_bxgy_vg_img_a
(As you can see, it causes trouble with systems like the Stackoverflow wiki formatter)
if we consider that the urls that have the searched keywords in them have higher placements in the search results and you're targeting unicode search terms then it may actually help.
But of course this is hardly the most important thing when it comes to position in search results.
I would guess from an SEO point of view it would be a really bad idea, unless you are specifically looking to target unicode search terms.

Resources