I would like to know, how can I convert/map a key from english into other language.
Description:
I'm living in the US and only have english keyboard. Many characters of my language are missed on this keyboard. Now I would like to write a small program (C, Assembler) that runs in background and convert key combinations, which are defined by me, in a character (unicode) of my language.
Example:
If I type "u" (using english keyboard) in Microsoft Word, the program should display "u" as normal.
But if I type "u1", it should display "ú" (link for this sign: http://unicode-table.com/en/00DA/).
How can I do that using C or Assembler?
Thank you very much.
Related
I found this sentence in Swift's String document
(https://developer.apple.com/documentation/swift/string)
Overview
A string is a series of characters, such as "Swift", that forms a collection. Strings in Swift are Unicode correct and locale insensitive, and are designed to be efficient. The String type bridges with the Objective-C class NSString and offers interoperability with C functions that works with strings.
But, I can't understand this one hundred percent and I don't know where to start.
To expand on #matt's answer a little:
The Unicode Consortium maintains certain standards for interoperation of data, and one of the most well-known standards is the Unicode string standard. This standard defines a huge list of characters and their properties, along with rules for how those characters interact with one another. (Like Matt notes: letters, emoji, combining characters [letters with diacritics, like é, etc.)
Swift strings being "Unicode-correct" means that Swift strings conform to this Unicode standard, offering the same characters, rules, and interactions as any other string implementation which conforms to the same standard. These days, being the main standard that many string implementations already conform to, this largely means that Swift strings will "just work" the way that you expect.
However, along with the character definitions, Unicode also defines many rules for how to perform certain common string actions, such as uppercasing and lowercasing strings, or sorting them. These rules can be very specific, and in many cases, depend entirely on context (e.g., the locale, or the language and region the text might belong to, or be displayed in). For instance:
Case conversion:
In English, the uppercase form of i ("LATIN SMALL LETTER I" in Unicode) is I ("LATIN CAPITAL LETTER I"), and vice versa
In Turkish, however, the uppercase form of i is actually İ ("LATIN CAPITAL LETTER I WITH DOT ABOVE"), and the lowercase form of I ("LATIN CAPITAL LETTER I") is ı ("LATIN SMALL LETTER DOTLESS I")
Collation (sorting):
In English, the letter Å ("LATIN CAPITAL LETTER A WITH RING ABOVE") is largely considered the same as the letter A ("LATIN CAPITAL LETTER A"), just with a modifier on it. Sorted in a list, words starting with Å would appear along with other A words, but before B words
In certain Scandinavian languages, however, Å is its own letter, distinct from A. In Danish and Norwegian, Å comes at the end of the alphabet: ... X, Y, Z, Æ, Ø, Å. In Swedish and Finnish, the alphabet ends with: ... X, Y, Z, Å, Ä, Ö. For these languages, words starting with Å would come after Z words in a list
In order to perform many string operations in a way that makes sense to users in various languages, those operations need to be performed within the context of their language and locale.
In the context of the documentation's description, "locale-insensitive" means that Swift strings do not offer locale-specific rules like these, and default to Unicode's default case conversion, case folding, and collation rules (effectively: English). So, in contexts where correct handling of these are needed (e.g. you are writing a localized app), you'll want to use the Foundation extensions to String methods which do take a Locale for correct handling:
localizedUppercase/uppercased(with locale: Locale?) over just uppercased()
localizedLowercase/lowercased(with locale: Locale?) over just lowercased()
localizedStandardCompare(_:)/compare(_:options:range:locale:) over just <
among others.
It basically just means that Swift strings are Unicode strings. A Swift string "character" is a character in a Unicode sense: a letter, an emoji, a combined letter-and-diacritic, whatever. A string can also be viewed not merely as a character sequence but as a sequence of UTF8, 16, or 32 code points. The "locale insensitive" stuff means they don't have a locale dependent encoding, as strings did in the bad old days before Unicode.
This is delightful but it has some downsides, most notably that strings qua character-sequence are not directly indexable by integers.
I'm currently translating a website from english into other languages but have a problem when it comes to technical terms (non words) like "crontab".
Should I keep the english translation or is there another way to find the equivalent?
These aren't actually english words and when it comes to languages like Japanese, I'm at a loss as to what to do.
Here's an example sentence as an example:
"Use crontab to schedule scripts."
which translated into Japanese via Google Translate becomes:
"スクリプトをスケジュールするcrontabを使用してください。"
You can see how bizarre this looks, and I'm wondering if the sentence could even be understood by a Japanese speaker.
What do I do in these situations?
Using English words in Japanese
Talking about the word crontab, I think it's not bizarre to write it in English in a Japanese sentence like this:
crotabを使用してください
(please use crontab)
On Japanese wikipedia, you can see how crontab is used without translating into Japanese.
http://ja.wikipedia.org/wiki/Crontab
In Japanese technical writing, especially when you mention name of tools, it is common to use English as it is without translating into Japanese.
Using Katakana
You could also write the sentence like below using Katakana.
クーロンタブを使用してください
(please use crontab).
Japanese usually writes words from English in Katakana. Japanese Katakana is phonetic, in other words each character represents a sound (not meaning). But In this case, it doesn't look natural.
Mistranslation
There is a mistranslation in your Japanese sentence.
スクリプトをスケジュールするcrontabを使用してください。
(Please use crontab which scedule a script.)
To correct this, you could go like this:
スクリプトをスケジュールするには、crontabを使用してください。
(In order to schedule a script, please use crontab.)
Hope this helps.
I have an application implemented in German / English. It uses property files for the translation strings in the various menus and dialogs. The problem I have is that these files have a separate mnemonic field like so:
Field1_Label=Open a file
Field1_Label_MNEMONIC=1O
So in this example, the MNEMONIC tells the dialog to underline the O, and if the the user types ALT+O, the dialog will set focus to the entry field / button associated with the label.
So far so good.
The problem I face is that the product is being translated into Chinese and Japanese. These ideographic languages use input method editors (IMEs) to compose their symbols. A symbol might be composed by phonetically typing the word into the IME which then produces the corresponding Chinese text. So I can't underline a symbol because there is no key equivalent to it.
So what do I do? What is best practice for dealing with this? I could potentially just remove all mnemonics altogether. I could potentially throw an ASCII char at the end of the string to acts as the mnemonic.
But what is the best industry practice for this?
The usual practice is what you hinted at in your question: the Latin character used for the original mnemonic is appended to the translated text in parentheses. Look at some screenshots of e.g. Japanese user interfaces and you will notice that UI elements tend looking like this:
File(F) | Edit(E) | View(V) | ...
Here are some examples:
http://www.komeiharada.com/Japanese/Tategaki.gif
http://i.stack.imgur.com/7N5XB.png
For example delphi code
wchar_IsASCii : array[0..1] of WCHAR ;
wchar_IsASCii[0] := 'A' ;
wchar_IsASCii[1] := 'じ' ;
How can I tell whether wchar_IsASCii[0] belong to ASCII, or wchar_IsASCii[1] does not belong to ASCII?
Actually, I only need know whether a UNICODE char belong to ASCII, that’s all How to distinguish a WCHAR char is Chinese, Japanese or ASCII.
I don't know Delphi, but what I can tell you is you need to determine what range the character fits into in Unicode. Here is a link about finding CJK characters in Unicode: What's the complete range for Chinese characters in Unicode?
and unless Delphi has some nice library for distinguishing Chinese and Japanese charatcers, you're going to have to determine that yourself. Here is a good answer here on SO for how to do that:
Testing for Japanese/Chinese Characters in a string
The problem is... what do you mean by ASCII ? Original ASCII standard is 7-bit code, known as Latin1 - it is not even a byte.
Then if you come with so-called "extended ASCII" - a 1 byte items - then half of it can be next to anything. It can by Greek on one machien, European diacritics on another, Cyrillic at third one... etc.
So i think if all you need is testing whether you have 7 bit Latin1 character - ruling out extended characters from French, German, Spanish alphabets and all Scandinavians ones, then - as Unicode was designed as yet another superset for Latin1 what you need is checking that (0 <= Ord(char-var)) and ($7f >= Ord(char-var)).
However, if you really need to tell languages, if you consider Greek And Cyrillic somewhat ASCII and Japanese alphabets (there are two by the way, Hiragana and Katakana) not (or if you consider French and German more or less ASCII-like, but Russian not) you would have to look at Unicode Ranges.
http://www.unicode.org/charts/index.html
To come with 32-bit codepoint of UCS4 standard you can use http://docwiki.embarcadero.com/Libraries/XE3/en/System.Character.ConvertToUtf32
There are next to standard IBM Classes for Unicode but looks no good translation for Delphi exists Has anyone used ICU with Delphi?
You can use Jedi CodeLib, but its tables are (comments are contradicting) either from Unicode 4.1 or 5.0, not from current 6.2, though for Japanese version 5.0 should be enough.
http://wiki.delphi-jedi.org/wiki/JCL_Help:TUnicodeBlock
http://wiki.delphi-jedi.org/wiki/JCL_Help:CodeBlockFromChar
http://wiki.delphi-jedi.org/wiki/JCL_Help:CodeBlockName#TUnicodeBlock
You can also use Microsoft MLang interface to query internet-like character codes (RFC 1766)
http://msdn.microsoft.com/en-us/library/aa741220.aspx
http://msdn.microsoft.com/en-us/library/aa767880.aspx
http://msdn.microsoft.com/en-us/library/aa740986.aspx
http://www.transl-gunsmoker.ru/2011/05/converting-between-lcids-and-rfc-1766.html
http://www.ietf.org/rfc/rfc1766.txt
Generally, a character belongs to ASCII, if its code is in range 0x0000..0x007F, see http://www.unicode.org/charts/PDF/U0000.pdf. A new Delphi has class function TCharacter.IsAscii but it is from some strange reason declared as private.
ASCII characters have a decimal value less than 127.
However, unless you are running a teletype machine from the 1960's, ASCII chars may not be sufficient. ASCII chars will only cover English language characters. If you actually need to support "Western European" characters such as umlaut vowels, graves, etc, found in German, French, Spanish, Swedish, etc, then testing for Unicode char value <= 127 won't suffice. You might get away with testing for char value <= 255, as long as you don't need to work with Eastern European scripts.
Given a character (one letter of a string), how could I identify to which language it belongs ? The options are: English, Russian, Hebrew.
Background: this character was entered by user in a form and then stored in a database.
It can be for example the first letter in one of these words:
Hello
Привет
שלום
The UNICODE standard is divided into "blocks". Go here:
http://www.unicode.org/charts/
http://en.wikipedia.org/wiki/Unicode_block
http://www.unicode.org/versions/Unicode6.0.0/
and find unicode blocks (intervals) for each language.
My guess:
English
Hebrew
Russian
So for you its the matter of simple number comparsion for each character (unicode ordinal value). Very simple.