How do I implement hotkeys in ideographic languages? - localization

I have an application implemented in German / English. It uses property files for the translation strings in the various menus and dialogs. The problem I have is that these files have a separate mnemonic field like so:
Field1_Label=Open a file
Field1_Label_MNEMONIC=1O
So in this example, the MNEMONIC tells the dialog to underline the O, and if the the user types ALT+O, the dialog will set focus to the entry field / button associated with the label.
So far so good.
The problem I face is that the product is being translated into Chinese and Japanese. These ideographic languages use input method editors (IMEs) to compose their symbols. A symbol might be composed by phonetically typing the word into the IME which then produces the corresponding Chinese text. So I can't underline a symbol because there is no key equivalent to it.
So what do I do? What is best practice for dealing with this? I could potentially just remove all mnemonics altogether. I could potentially throw an ASCII char at the end of the string to acts as the mnemonic.
But what is the best industry practice for this?

The usual practice is what you hinted at in your question: the Latin character used for the original mnemonic is appended to the translated text in parentheses. Look at some screenshots of e.g. Japanese user interfaces and you will notice that UI elements tend looking like this:
File(F) | Edit(E) | View(V) | ...
Here are some examples:
http://www.komeiharada.com/Japanese/Tategaki.gif
http://i.stack.imgur.com/7N5XB.png

Related

Eggplant : How to read text with special characters like ' _ etc

I am trying to read a text in a given rectangle using readText() function.
The function works correctly except when it has to read some text which has special characters like ' _ & etc.
I tried using validCharacters with readText() function. But it didn't help.
Code -
put ReadText((287,125,810,164),validCharacters:"_-'.ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890") into Login
I tried working with character collections. But that doesn't seem to be right because the text trying to pick is a dynamic text combination of numbers alphabets and a special character. So one cannot create a library of character collection of every alphabet (a-z, A-Z), numbers(0-9) and special characters.
Example of text trying to read:
Login_Userid1_1, Login'Userid1_1
So how do I read such text correctly
Debugging OCR is a bit of an imprecise science. EggPlant has a lot of OCR Parameters to tweak. When designing test cases it's best to try use other mechanisms to gather information whenever possible. ReadText() should be considered a last resort when more reliable methods are unavailable. When I've used it I've often needed to do a lot of trial and error to find the right set of settings, and SearchRectangle to get consistent results. Without seeing exactly what images you are trying to read text from it's difficult to impossible to troubleshoot where the issue might be.
One thing that does stand out to me is that you're trying to read strings that may contain underscores. ReadText() has an optional property IgnoreUnderscores which treats underscores as spaces. By default this property is set to ON. It defaults to ON because some OCR engines have problems identifying underscore characters consistently.
If you want to have ReadText() handle underscores you'll want to explicitly set this property to OFF.
ReadText(rect, validCharacters:chars, ignoreUnderscores:OFF)

Regular expression for hashtag text with multi language support

I have a texts like #sample_123 , #123_sample , #_sample123 so i have to use regular expression to check the text contains only alphanumeric and underscore and also i want to support multi languages.
Currently i am using regular expression like (#)([:alpha:]+) but it detects only #sample( eg: #sample_123). So, Can any one please suggest the correct regular expression to fix out this problem.
you can use:
^#(\d|\w|_)+$
Debuggex Demo
This would validate any words that start with an hash and contains only alpha numeric characters or underscore. Of course there are no restrictions on how many characters after the hashtag there should be, so for example, a hashtag like #_ is considered valid, if this is not the wanted behavior please be more detailed on the constraints you want.

#font-face: Icon fonts & Converting CSS character (Hex) Value

Background
I am working a lot at the moment with webfonts, and specifically icon fonts. I need to ascertain the which character a specific icon is for testing purposes, so I can simply type the character &/or copy-paste it.
Example
The CSS of most icon fonts is similar, using the :before pseudo approach e.g.
.icon-search:before{content:"\f002"}
Question
I believe this encoding to be called CSS character (Hex) is this the
correct?
Are there any tools that allow me to enter the escaped CSS character value and convert it to a value I can copy and paste
Is there a tool that can convert this to a HTML decimal value e.g. & = simple amperstand
Summary
I would love to be able to find out which character it is so I can simply type it on my keyboard. I have spent ages looking it up but am not quite sure what this type of encoding and conversion is called so can't find what i'm looking for. I'd appreciate some pointers.
SOLVED - the answer below for completeness
After some research myself I just want to confirm that the encoding used in CSS is indeed called HEX encoding.
I did find a converter that allows me to enter the HEX value and converts it to Decimal http://www.binaryhexconverter.com/hex-to-decimal-converter
If you want to use a HTML entity then all you need to do is wrap the converted decimal value in the obligatory &# ; entity start/finish characters and you are good to go.
Example
(HEXvalue = \f002) converts to (Decimal = 61442)
This HTML entity is therefore 

is it ever appropriate to localize a single ascii character

When would it be appropriate to localize a single ascii character?
for instance /, or | ?
is it ever necessary to add these "strings" to the localization effort?
just want to give some people the benefit of the doubt and make sure there's not something I didn't think of.
Generally it wouldn't be appropriate to use something like that except as a graphic element (which of course wouldn't be I18N'd in the first place, much less L10N'd). If you are trying to use it to e.g. indicate a ratio then you should have something like "%d / %d" instead, and localize the whole thing.
Yes, there are cases where these individual characters change in localization. This is not a comprehensive list, just examples I happen to know.
Not every locale uses , to separate thousands and . for the decimal. (However, these will usually be handled by your number formatter. If you do so yourself, you're probably doing it wrong. See this MSDN blog post by Michael Kaplan, Number format and currency format are not always the same.)
Not every language uses the same quotation marks (“, ”, ‘ and ’). See Wikipedia on Non-English Uses of Quotation Marks. (Many of these are only easy to replace if you use full quote marks. If you use the " and ' on your keyboard to mark both the start and end of sentences, you won't know which of two symbols to substitute.)
In Spanish, a question or exclamation is preceded by an inverted ? or !. ¿Question? ¡Exclamation! (Obviously, you can't fix this with a locale substitution for a single character. Any questions or exclamations in your application should be entire strings anyway, unless you're writing some stunningly intelligent natural language generator.)
If you do find a circumstance where you need to localize these symbols, be extra cautious not to accidentally localize a symbol like / used as a file separator, " to denote a string literal or ? for a search wildcard.
However, this has already happened with CSV files. These may be separated by ,, or may be separated by the local list separator. See What would happen if you defined your system's CSV delimiter as being a quotation mark?
In Greek, questions end with a semicolon rather than ?, so essentially the ? is replaced with ; ... however, you should aim to always translate the question as a complete string including question mark anyway.

What things should be localized in an application

When thinking about what areas should be taken into account for a localized version of an application a number of things pop up right away:
Text display
Date and time
Units
Numbers and decimals
User input formats
LeftToRight support
Dialog and control sizes
Are there other things/areas to remember or keep in mind when building a localizable application? Are there any resources out there which provide a listing of best practices not just for text localization but for all things around localization?
After Kudzu's talk about l10N I left the room with way more questions then I had before and none of my old questions answered. But it gave me something to think about and brought the message "depends on how far you can/want to go" accross.
Translate text bodies with aforementioned things
Test all your controls for length/alignment in LTR/RTL, TTB(TopToBottom) BTT and all it's combinations.
Look out for special characters and encodings
Look out for combinations of different alignments (LTR, RTL, TTB, BTT) and how they effect punctuation and quotation signs.
Align controls according to text alignment (Hebrew Win has its start menu at the right
Take string lengths into account. They can overflow in other languages.
Put labels at the correct side of icons (LTR, TTB etc)
Translate language selection controls
No texts in images (can't be translated)
Translate EVERYTHING (headers, logos, some languages use different brand names, product names etc)
Does the region have a 24:00 or a 00:00 (changes the AM/PM that goes with it too)
Does the region use AM/PM or the 24:00 system
What calendar system are they using
What digit is for what part of the date (day, month, year in all its combinations)
Try to avoid "copying [number] files" equivalents. Some regions have different rules about changing words according to quantities. (This is an extremely complicated topic that I will elaborate on if desired)
Translate sentences, not words. Syntax rules are too complicated to put in your business logic.
Don't use flags for regions. Languages != countries
Consider what languages / dialects you can support (e.g. India has a gazillion of languages)
Encoding
Cultural rules (some western images displaying business woman can be near offensive in some other cultures)
Look out for language generalizations (e.g. boot(UK) != boot(US))
Those are the ones from the top of my head. The list just went on and on...
Don't forget the overhead of converting all documentation and help files.
a couple hints from my J2ME apps days:
don't translate separate words, translate whole phrases, even if there are matching repetitions. You'll later have to translate to a language where words have to be modified differently in different contexts and you may end up with an analog of "color: greenish"
Right2Lelf includes numbering of lists, alignment, and alternative scroll bars
Arabic languages write the same letter differently based on surrounding letters. You can't just print a string from a character buffer, you'll need a special control to output those or support from you platform
alphabetical sorting is HARD. No native Chinese could ever explain me the rules, but they will always spot wrongly sorted words. There appear to be a number of options to sort Chinese. I guess other languages may have the same problem

Resources