translations UX: include colon etc.? - translation

I'm solving little UX problem. I'm marking texts in our app and then the texts are given to some people for translation.
I want to knouw your opinion what way would be better to use:
Text in app is e.g. " Serial number: "
Do I have to give for translation whole text including the colon at the end
or give just the string Serial number without the colon?
People sometimes forget to write the colon into the translation.
But then there are texts like " Please, send the device to our address: ", where it could look weird without the colon at the end. Someone then can complete the translation with a dot added at the end.
What you suggest?

Related

Eggplant : How to read text with special characters like ' _ etc

I am trying to read a text in a given rectangle using readText() function.
The function works correctly except when it has to read some text which has special characters like ' _ & etc.
I tried using validCharacters with readText() function. But it didn't help.
Code -
put ReadText((287,125,810,164),validCharacters:"_-'.ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz01234567890") into Login
I tried working with character collections. But that doesn't seem to be right because the text trying to pick is a dynamic text combination of numbers alphabets and a special character. So one cannot create a library of character collection of every alphabet (a-z, A-Z), numbers(0-9) and special characters.
Example of text trying to read:
Login_Userid1_1, Login'Userid1_1
So how do I read such text correctly
Debugging OCR is a bit of an imprecise science. EggPlant has a lot of OCR Parameters to tweak. When designing test cases it's best to try use other mechanisms to gather information whenever possible. ReadText() should be considered a last resort when more reliable methods are unavailable. When I've used it I've often needed to do a lot of trial and error to find the right set of settings, and SearchRectangle to get consistent results. Without seeing exactly what images you are trying to read text from it's difficult to impossible to troubleshoot where the issue might be.
One thing that does stand out to me is that you're trying to read strings that may contain underscores. ReadText() has an optional property IgnoreUnderscores which treats underscores as spaces. By default this property is set to ON. It defaults to ON because some OCR engines have problems identifying underscore characters consistently.
If you want to have ReadText() handle underscores you'll want to explicitly set this property to OFF.
ReadText(rect, validCharacters:chars, ignoreUnderscores:OFF)

ASCII Representation of Hexadecimal

I have a string that, by using string.format("%02X", char), I've received the following:
74657874000000EDD37001000300
In the end, I'd like that string to look like the following:
t e x t NUL NUL NUL í Ó p SOH NUL ETX NUL (spaces are there just for clarification of characters desired in example).
I've tried to use \x..(hex#), string.char(0x..(hex#)) (where (hex#) is alphanumeric representation of my desired character) and I am still having issues with getting the result I'm looking for. After reading another thread about this topic: what is the way to represent a unichar in lua and the links provided in the answers, I am not fully understanding what I need to do in my final code that is acceptable for this to work.
I'm looking for some help in better understanding an approach that would help me to achieve my desired result provided below.
ETA:
Well I thought that I had fixed it with the following code:
function hexToAscii(input)
local convString = ""
for char in input:gmatch("(..)") do
convString = convString..(string.char("0x"..char))
end
return convString
end
It appeared to work, but didnt think about characters above 127. Rookie mistake. Now I'm unsure how I can get the additional characters up to 256 display their ASCII values.
I did the following to check since I couldn't truly "see" them in the file.
function asciiSub(input)
input = input:gsub(string.char(0x00), "<NUL>") -- suggested by a coworker
print(input)
end
I did a few gsub strings to substitute in other characters and my file comes back with the replacement strings. But when I ran into characters in the extended ASCII table, it got all forgotten.
Can anyone assist me in understanding a fix or new approach to this problem? As I've stated before, I read other topics on this and am still confused as to the best approach towards this issue.
The simple way to transform a base16-encoded string is just to
function unhex( input )
return (input:gsub( "..", function(c)
return string.char( tonumber( c, 16 ) )
end))
end
This is basically what you have, just a bit cleaner. (There's no need to say "(..)", ".." is enough – if you specify no captures, you'll automatically get the whole match. And while it might work if you write string.char( "0x"..c ), it's just evil – you concatenate lots of strings and then trigger the automatic conversion to numbers. Much better to just specify the base when explicitly converting.)
The resulting string should be exactly what went into the hex-dumper, no matter the encoding.
If you cannot correctly display the result, your viewer will also be unable to display the original input. If you used different viewers for the original input and the resulting output (e.g. a text editor and a terminal), try writing the output to a file instead and looking at it with the same viewer you used for the original input, then the two should be exactly the same.
Getting viewers that assume different encodings (e.g. one of the "old" 8-bit code pages or one of the many versions of Unicode) to display the same thing will require conversion between different formats, which tends to be quite complicated or even impossible. As you did not mention what encodings are involved (nor any other information like OS or programs used that might hint at the likely encodings), this could be just about anything, so it's impossible to say anything more specific on that.
You actually have a couple of problems:
First, make sure you know the meaning of the term character encoding, and that you know the difference between characters and bytes. A popular post on the topic is The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)
Then, what encoding was used for the bytes you just received? You need to know this, otherwise you don't know what byte 234 means. For example it could be ISO-8859-1, in which case it is U+00EA, the character ê.
The characters 0 to 31 are control characters (eg. 0 is NUL). Use a lookup table for these.
Then, displaying the characters on the terminal is the hard part. There is no platform-independent way to display ê on the terminal. It may well be impossible with the standard print function. If you can't figure this step out you can search for a question dealing specifically with how to print Unicode text from Lua.

Name fixing / validation?

Frequently, I have found, users enter very poorly formatted names when they register. I get all kinds of crazy formatting from Paypal IPN and other payment gateways even from all lower case to all caps to just flat out messed up.
One thing I do with this information is to send out emails and offer greetings, however I dislike the poorly formatted names. Has someone thought about this before and figured out a happy middle road solution? For example, I realize it would be poor form to simply correct spellings that are seemingly errors, but it would be wise to at least fix "what is reasonable." At the minimum that would be capitalization. Perhaps simply upcasing the first letters of each distinct "word" in the first and last name strings would be sufficient?
Or is there a a better method? Perhaps a database of common name capitalizations for things like "McBerry" and "van Buuren"? A gem or some such tool? Just kind of curious. Perhaps it is foolish to put this much thought into this topic, but I really like to be as courteous and professional as possible in my communications with users vs just using a poorly formatted name as is the usual.
The best you can hope to do is capitalize the first letter of their first/last/middle name:
"bob".capitalize => "Bob"
From Ruby:
capitalize → new_str click to toggle source
Returns a copy of str with the first character converted to uppercase and the remainder to lowercase. Note: case conversion is effective only in ASCII region.
"hello".capitalize #=> "Hello"
"HELLO".capitalize #=> "Hello"
"123ABC".capitalize #=> "123abc"
You can also use downcase to level everything out, then capitalize to make it "right".
For instance:
fName = "jIMMY"
lName = "sMITH"
fName.downcase
lName.downcase
fName.capitalize
lNmae.capitalize
puts fName + lName => Jimmy Smith
However, with names like VanBuuren, it might be a little harder.
Here is a link to Ruby strings which has some methods that might help you on your quest.
http://www.ruby-doc.org/core-2.0/String.html

is it ever appropriate to localize a single ascii character

When would it be appropriate to localize a single ascii character?
for instance /, or | ?
is it ever necessary to add these "strings" to the localization effort?
just want to give some people the benefit of the doubt and make sure there's not something I didn't think of.
Generally it wouldn't be appropriate to use something like that except as a graphic element (which of course wouldn't be I18N'd in the first place, much less L10N'd). If you are trying to use it to e.g. indicate a ratio then you should have something like "%d / %d" instead, and localize the whole thing.
Yes, there are cases where these individual characters change in localization. This is not a comprehensive list, just examples I happen to know.
Not every locale uses , to separate thousands and . for the decimal. (However, these will usually be handled by your number formatter. If you do so yourself, you're probably doing it wrong. See this MSDN blog post by Michael Kaplan, Number format and currency format are not always the same.)
Not every language uses the same quotation marks (“, ”, ‘ and ’). See Wikipedia on Non-English Uses of Quotation Marks. (Many of these are only easy to replace if you use full quote marks. If you use the " and ' on your keyboard to mark both the start and end of sentences, you won't know which of two symbols to substitute.)
In Spanish, a question or exclamation is preceded by an inverted ? or !. ¿Question? ¡Exclamation! (Obviously, you can't fix this with a locale substitution for a single character. Any questions or exclamations in your application should be entire strings anyway, unless you're writing some stunningly intelligent natural language generator.)
If you do find a circumstance where you need to localize these symbols, be extra cautious not to accidentally localize a symbol like / used as a file separator, " to denote a string literal or ? for a search wildcard.
However, this has already happened with CSV files. These may be separated by ,, or may be separated by the local list separator. See What would happen if you defined your system's CSV delimiter as being a quotation mark?
In Greek, questions end with a semicolon rather than ?, so essentially the ? is replaced with ; ... however, you should aim to always translate the question as a complete string including question mark anyway.

Localization ground rules

I've just submitted my first localized app to the iPhone app store the other day. I decided to do it to learn about application localization, and because my app was simple enough to stumble through localizing with my mediocre french. I know I didn't do everything "right", but I learned a lot from doing it once. I'd like to keep doing this for all my future apps.
For one thing, I learned to code with localization in mind, but don't start localizing until your app is ready to be released. I spent way too much time doing small tweaks in 2 UI files.
What are your favourite localization basics, cardinal rules, and best practices?
I'm thinking mostly for small hobby developers like myself, although stuff from the big leagues would be interesting as well.
The biggest one for me is don't concatenate strings:
Bad:
"You have " + messageCount + " messages";
Good:
"You have {0} messages"
Word order varies from language to language, and so you can't assume where in a sentence your dynamic data might occur.
In your UI, allow for about 30-50% expansion of translations from English. A method I learned early in my career was to produce a 'pig latin' localized version of the UI.
If your user interface is still legible in Pig Latin, it will probably be legible in real languages.
Ifway ouryay userway interfaceway isway illstay egiblelay inway Igpay
Atinlay, itway illway obablypray ebay egiblelay inway ealray
anguageslay.
Use Unicode for all strings - UTF-16 or UTF-8. If reading/writing to any program/format that doesn't assume that by default, make sure you specify UTF-16 or UTF-8 explicitly.
As Mike Sickler said, don't concatenate strings. Better yet, don't have sentences with inserts, since you don't know how the insert affects the rest of the sentence - different languages have different rules regarding plural / etc.
Bad: "You have " + messageCount + " messages"
Better: "You have {0} messages" (but what if {0} == 1? Do you write message(s)? What about Hebrew, where "one" comes after the noun, but other numbers before?)
Best: "Messages: {0}"
As rhsatrhs said, allow 30-50% expansion. In my (big league) company, we usually assume that German is the longest, although I found out that sometimes Russian got over 100% larger. I suspect it's sometimes translators who don't know the exact term, so they write a longer description using close term (Example: Symbol ==> source code reference marker).

Resources