Localization and Lists of Decimal Numbers - localization

I'm working on localizing some strings in our application and we have text that looks something like:
Factor f (1.0, 1.2, or 1.5)
In a locale that uses a comma for the decimal point, would this be written as:
Factor f (1,0, 1,2, or 1,5)
Maybe it's just not what I'm accustomed to, but that looks crazy hard to read quickly.
I'm also wondering about text like version numbers. Would Firefox 3.5.1 be Firefox 3,5,1?

If I understand what you are looking for, there are two things in regards to Internationalization here:
Decimal separator
List separator
Obviously these separators are quite tightly coupled, so in Locale that uses comma as decimal separator, list separator must be something else. Usually this is a semicolon and there just a few Locales that uses something different than comma or semicolon for list separator.
To summarize:
In Locales that uses dot as a decimal separator, comma is usually used as a list separator, so in some free-form text you might expect something like Factor f (1.0, 1.2, or 1.5).
In Locales that uses comma as a decimal separator, semicolon is typically used as a list separator – Faktor f (1,0; 1,2; oder 1,5) is something you should expect.
I am not sure what you are up to (the technology does matter in the advice) but you can leave the format as well as list separator to the translators to decide. In .Net list separator is given, though (no need to ask translators for input, just use appropriate property of CultureInfo class).

Sorry to say, but I don't know about your first question. However, as far as version numbers go, they are generally left untranslated. End users typically attribute little meaning to the version's numeric value (they are infact NOT numeric in nature. 3.90 < 3.100). They are simply discrete numbers with a universally-accepted separator, and not natural numbers with natural "grouping/decimal" separators.
In addition to end-user experience with version numbers. Developers are often known to parse version numbers in the standard format of {major}.{minor}.{revision}, using . as the well-known seperator character.
I did find this link that talks about your first question (sort of). I don't know how authoritative or credible it is; but it doesn't look dubious.

Related

This regex matches in BBEdit and regex.com, but not on iOS - why?

I am trying to "highlight" references to law statutes in some text I'm displaying. These references are of the form <number>-<number>-<number>(char)(char), where:
"number" may be whole numbers 18 or decimal numbers 12.5;
the parenthetical terms are entirely optional: zero or one or more;
if a parenthetical term does exist, there may or may not be a space between the last number and the first parenthesis, as in 18-1.3-401(8)(g) or 18-3-402 (2).
I am using the regex
((\d+(\.\d+)*-){2}(\d+(\.\d+)*))( ?(\([0-9a-zA-Z]+\))*)
to find the ranges of these strings and then highlight them in my text. This expression works perfectly, 100% of the time, in all of the cases I've tried (dozens), in BBEdit, and on regex101.com and regexr.com.
However, when I use that exact same expression in my code, on iOS 12.2, it is extremely hit-or-miss as to whether a string matching the regex is actually found. So hit-or-miss, in fact, that a string of the exact same form of two other matches in a specific bit of text is NOT found. E.g., in this one paragraph I have, there are five instances of xxx-x-xxx; the first and the last are matched, but the middle three are not matched. This makes no sense to me.
I'm using the String method func range(of:options:range:locale:) with options of .regularExpression (and nil locale) to do the matching. I see that iOS uses ICU-compatible regexes, whereas these other tools use PCRE (I think). But, from what I can tell, my expression should be compatible and valid for my case with the ICU parsing. But, something is definitely different, and I cannot figure out what it is.
Anyone? (I'm going to give NSRegularExpression a go and see if it behaves differently, but I'd still like to figure out what's going on here.)

Java underscore equivalent in F#?

In Java it's possible to write 1_000_000 instead of 1000000 for better readability. Is there something equivalent in F#?
This question was already asked on feature request page and the current status of this chanbge request is "planned" and "approved in principle".
So it may will be implemented in one of the next releases.
You can find more information about this request (like a summary about this feature, the motivation and suggested implementation details) on the github page for F#:
Summary
Allow underscores between any digits in numeric literals. This feature enables you, for example, to separate groups of digits in numeric literals, which can improve the readability of your code.
For instance, if your code contains numbers with many digits, you can use an underscore character to separate digits in groups of three, similar to how you would use a punctuation mark like a comma, or a space, as a separator.
Motivation
This is a popular feature in other languages. Some other languages with a similar feature:
Perl
Ruby
Java 7
C++11 (use single quote)
just to name a few...
Detailed design
You can place underscores only between digits. You cannot place underscores in the following places:
At the beginning or end of a number
Adjacent to a decimal point in a floating point literal
Prior to an F or L or other suffix
In positions where a string of digits is expected

How do Europeans write a list of numbers with decimals?

As I understand it, Europeans(*) write numbers with a comma for a decimal separator, so one-and-a-quarter is written as 1,25
Europeans also use commas to separate lists, so how do you write a list of decimal numbers? I, as an Englishman, would write one-and-a-quarter, one-and-a-half, one-and-three-quarters like this:
1.25, 1.5, 1.75
How do you do that in Europe?
(Why is this a programming question? Because I'm writing a program that will ask European users for a list of numbers!)
* For the purposes of this question, there are no English-speaking countries in Europe. :-)
I'm European (french), and in almost all programs here we have to use semicolons ';' as a separator, even if the numbers are only integers because the comma doesn't look like a separator for us. In mathematics, semicolons are the only right way here to separate a list of numbers.
The most common example is when we have to enter the page numbers we want to print on a PDF, all programs ask for a semicolon-separated list and I clearly found it intuitive. I think they would have changed it if it was uncomfortable for some.
This varies by culture, and within a culture. The CLDR data contains the “list” element that specifies the list separator character, and it is the semicolon for most cultures, see the chart of number symbols (element “list”). The definition is very implicit though, and there is variation inside locales. Some people regard 1,25, 1,5, 1,75 as acceptable, while others prefer 1,25; 1,5; 1,75. There are also people who seriously think that in a strongly mathematical or numeric context, one should deviate from the locale practices and use the Anglo-Saxon notation with decimal point, hence with comma as separator.
On the practical side, I think it would not be very wrong to use ”;” as number list separator when decimal comma is used, or even when decimal point is used. So you might even consider using ”;” in all locales.
But when it comes to user input, it’s trickier. In principle, you be liberal in what you accept, but since the comma can be meant to be a decimal comma, a thousands separator, or a list item separator, there is such a thing as being too liberal.
If possible, prompt for each number separately, avoiding the separator issue. If this is not possible, the crucial thing is to make it very, very clear to the use which separator is expected. I would go as far as saying that requiring for the semicolon ”;” is the most reliable thing to do.
Why ask about Europeans in general ? I don't think there is one European way of doing so, and if it happens to be the case then it would be sheer luck. Europe is comprised of different cultures and each has its own rules.
You don't mention what platform you are using but you might be able to rely on your plaform to get this information. In the case of .NET, you can get this information through Textinfo.ListSeparator. For example this would give you the French one (result: a semicolon):
string listSeparator = new CultureInfo("fr-FR").TextInfo.ListSeparator;
I don't think there is one way to do it. White space separating the numbers would works just the same, or you could use a semicolon (';') to separate the numbers

is it ever appropriate to localize a single ascii character

When would it be appropriate to localize a single ascii character?
for instance /, or | ?
is it ever necessary to add these "strings" to the localization effort?
just want to give some people the benefit of the doubt and make sure there's not something I didn't think of.
Generally it wouldn't be appropriate to use something like that except as a graphic element (which of course wouldn't be I18N'd in the first place, much less L10N'd). If you are trying to use it to e.g. indicate a ratio then you should have something like "%d / %d" instead, and localize the whole thing.
Yes, there are cases where these individual characters change in localization. This is not a comprehensive list, just examples I happen to know.
Not every locale uses , to separate thousands and . for the decimal. (However, these will usually be handled by your number formatter. If you do so yourself, you're probably doing it wrong. See this MSDN blog post by Michael Kaplan, Number format and currency format are not always the same.)
Not every language uses the same quotation marks (“, ”, ‘ and ’). See Wikipedia on Non-English Uses of Quotation Marks. (Many of these are only easy to replace if you use full quote marks. If you use the " and ' on your keyboard to mark both the start and end of sentences, you won't know which of two symbols to substitute.)
In Spanish, a question or exclamation is preceded by an inverted ? or !. ¿Question? ¡Exclamation! (Obviously, you can't fix this with a locale substitution for a single character. Any questions or exclamations in your application should be entire strings anyway, unless you're writing some stunningly intelligent natural language generator.)
If you do find a circumstance where you need to localize these symbols, be extra cautious not to accidentally localize a symbol like / used as a file separator, " to denote a string literal or ? for a search wildcard.
However, this has already happened with CSV files. These may be separated by ,, or may be separated by the local list separator. See What would happen if you defined your system's CSV delimiter as being a quotation mark?
In Greek, questions end with a semicolon rather than ?, so essentially the ? is replaced with ; ... however, you should aim to always translate the question as a complete string including question mark anyway.

Do numbers need to be localized?

This seems like a stupid question. Is the number "10" refered to "10" in Hebrew, Arabic, and all languages? I'm not seeing anywhere where it says you need to do anything special with numbers when dealing with localization. Maybe number format but what about the number itself? I would think that numbers would read differently in right-to-left languages but translate.google.com is giving me the same number back. Can anyone confirm this?
Arabic and Japanese (?) do have different glyphs for numbers, but the standard system is so commonplace, that usually numbers are not converted.
If you're using the .NET formatting functions, then the numbers will be formatted according to the system preferences (I'm talking commas and decimal points here)
Different languages can use different digit sigils;
Number representation is different. eg 1,234.56 in English is represented as 1.234'56 in German.
So the answer is yes.
The digits 0-9 usually don't require any localization, except minor tweaks like AndreyT said, but those are more "fonts" related than anything.
The only important thing to take into account is large number representation.
For example, take 1mio$
In Switzerland, it will be:
$1'000'000.-
in US
$1,000,000
In Japan it will be
$100万
I don't know other place, but you got the idea.
For Japan, it's very uncommon to see numbers greater than 10'000 without using a kanji.
But I think you should see with the person doing the localization.
For the actual numbers themselves (and not floating poing, thousands seperator, etc) there are in fact differences between languages.
Hebrew numerals actually use the Hebrew letters as a number system, though it is used only for "traditional" numbers, such as the year in the Jewish calendar, the chapter, verse and page numbers in the Hebrew Bible, in lists (similar to using roman numerals instead of numbers), etc. But for all other cases, Hindu-Arabic numerals are used (e.g. 1, 2, 3, 4...) and are written left-to-right, even while the rest of the Hebrew text is written right-to-left (i.e. NML KJIHG 123 FEDC BA).
In Arabic, most countries use the Arabic-Indic numerals, but the Hindu-Arabic numerals are also understood.
In any case, .NET localization should take care of all conversions and display issues, and there's nothing special you need to do unless you render your own GUI.
There quite a few tings that can be localized in numbers. For example, in USA the fractional part of a number (if it has a fractional part) is separated by a dot, while in Russia a comma is normally used. In USA commas would be used to separate three-digit groups in the number, while in Russia it is not customary to separate them at all, or space is used for that purpose (or maybe some other character, but not a comma). And so on (although most of the formatting options apply to monetary quantities).
Even the preferred way to write characters themselves can depend on locale. In USA the character for '7' is usually written in two strokes, while in Europe it quite often has a third stroke - a short horizontal line through the middle. This, of course, is less important, since the two-stroke version is still recognized everywhere.
If you are displaying the numbers for math purposes (for example, showing 5 + 3 = 8), then use the standard digits 0-9. These are used nearly universally in mathematics.
If you are displaying something that is highly localized
(i.e. pricing on a street vendor's point-of-sale system in Saudi Arabia), there are a handful of countries that use different digits that are localized to their respective languages.
Most regions of people in the world will be fine with understanding 0-9 though.
I found this website to be a good starting guide: https://phrase.com/blog/posts/number-localization/
Some examples:
Bengali, for example, uses the Bengali–Assamese numeral system, whose
digits differ from the Western Arabic system: ০, ১, ২, ৩, ৪, ৫, ৬, ৭,
৮, ৯.
In some locales like Saudi Arabia, for example, it’s common to
represent numbers in the local numeral system, Eastern Arabic, and not
the Western Arabic system.
Keep in mind that we are just talking about digits here. When it comes to fractions (/), decimals (.), percentages (%), large number separators (,), number symbols (#), etc. most regions have specific rules and that's a whole other topic. They are not universal.

Resources