iOS Internationalize Plain Number - ios

I'm very new to the process of internationalization, so hopefully the answers to these questions aren't stupidly obvious.
In my app, I have some user-facing numbers that have no qualifiers or separators (i.e. they don't represent currency, time, or anything else; they're just numbers. And, I want, for example, 1000 to display as 1000, not 1,000 or 1.000). My questions as as follows:
If I choose to use a NumberFormatter to internationalize these numbers, what changes (if any) might it make to the format to the resulting String based on the user's locale? For example, would a right-to-left locale reverse the order of the digits in a multi-digit number?
My app's functionality depends on the user seeing all numbers in base 10 and each number 0-9 being representable as a single character... would internationalizing the numbers invalidate these assumptions?
For the record, this is the only code I'm using to try to internationalize the numbers:
NumberFormatter.localizedString(from: number as NSNumber, number: .none)

Related

Why is it best to store a telephone number as a string vs. integer?

As the question states, why is it considered best practice to store telephone numbers as strings rather than integers in the telephone_number column?
Not sure I understand the rationale for this. Please help clear this up!
Thanks!
Telephone numbers are strings of digit characters, they are not integers.
Consider for example:
Expressing a telephone number in a different base would render it meaningless
Adding or multiplying two telephone numbers together, or any math operation on a phone number, is meaningless. The result is not another telephone number (except by conicidence)
Telephone numbers are intended to be entered "as-is" into a connected device.
Telephone numbers may have leading zeroes.
Manipulations of telephone numbers, such as adding an area code, are String operations.
Storing the string version of the telephone number makes this clear and unambiguous.
History: On old pulse-encoded dial systems, the code for each digit in a telephone number was sent as the same number of pulses as the digit (or 10 pulses for "0"). That may be why we still use digits to represent the parts of a phone number. See http://en.wikipedia.org/wiki/Pulse_dialing
What Neil Slater said is correct. I would add that there are lots of edge cases where you can't express a telephone number as a number value consistently.
For example, consider these numbers:
011-123-555-1212
+11-123-555-1212
+1 (112) 355-5121 x2
These are all potentially valid phone numbers, but they mean very different things. Yet, in integer form, they are all 111235551212.
If you are going to store the number for display from input, then you must use a string.
However, while it is true that no mathematical operations can be performed on a number that have meaning. Using a number in hashsets and for indexing is quicker than using a string. So provided you can guarantee or homogenise your set of numbers, so they are all consistent, then you may see better performance operating on a number.
For example, in the Telco world, rating calls for a given customer includes a lot of searching on their CLI and in this situation it is faster and cheaper to search by integer. Generally though strings will be fine performance wise, it is only where performance matters and you have multiple searches to perform for a huge range of numbers - i.e. Rating 250 million calls across 2 million lines and 2000 tariffs. In memory rating also gets expensive, so being able to use a 64bit int or uint is cheaper when dealing with these volumes.
Consider these phone numbers for example
099-1234-56789 or +91-8907-687665.
In this case,if the phone_number attribute is of type integer,then it can't accept these values.It should be a string to hold these type of values.So string is always preferred than integer
There is several reasons for this :
Phone numbers often start with a "0" : an integer will remove all leading "0"s
Phone number can have special char : +, (, -, etc. (for exemple : +33 (0)6 12 23 34)
You cannot perform operations on phones : adding phones, for instance, would be meaningless
Phone number may be internationalised, i.e. different format for different people, thus not possible with integers
There might be other reasons, but I guess that's already a fair amount of those :)

How do Europeans write a list of numbers with decimals?

As I understand it, Europeans(*) write numbers with a comma for a decimal separator, so one-and-a-quarter is written as 1,25
Europeans also use commas to separate lists, so how do you write a list of decimal numbers? I, as an Englishman, would write one-and-a-quarter, one-and-a-half, one-and-three-quarters like this:
1.25, 1.5, 1.75
How do you do that in Europe?
(Why is this a programming question? Because I'm writing a program that will ask European users for a list of numbers!)
* For the purposes of this question, there are no English-speaking countries in Europe. :-)
I'm European (french), and in almost all programs here we have to use semicolons ';' as a separator, even if the numbers are only integers because the comma doesn't look like a separator for us. In mathematics, semicolons are the only right way here to separate a list of numbers.
The most common example is when we have to enter the page numbers we want to print on a PDF, all programs ask for a semicolon-separated list and I clearly found it intuitive. I think they would have changed it if it was uncomfortable for some.
This varies by culture, and within a culture. The CLDR data contains the “list” element that specifies the list separator character, and it is the semicolon for most cultures, see the chart of number symbols (element “list”). The definition is very implicit though, and there is variation inside locales. Some people regard 1,25, 1,5, 1,75 as acceptable, while others prefer 1,25; 1,5; 1,75. There are also people who seriously think that in a strongly mathematical or numeric context, one should deviate from the locale practices and use the Anglo-Saxon notation with decimal point, hence with comma as separator.
On the practical side, I think it would not be very wrong to use ”;” as number list separator when decimal comma is used, or even when decimal point is used. So you might even consider using ”;” in all locales.
But when it comes to user input, it’s trickier. In principle, you be liberal in what you accept, but since the comma can be meant to be a decimal comma, a thousands separator, or a list item separator, there is such a thing as being too liberal.
If possible, prompt for each number separately, avoiding the separator issue. If this is not possible, the crucial thing is to make it very, very clear to the use which separator is expected. I would go as far as saying that requiring for the semicolon ”;” is the most reliable thing to do.
Why ask about Europeans in general ? I don't think there is one European way of doing so, and if it happens to be the case then it would be sheer luck. Europe is comprised of different cultures and each has its own rules.
You don't mention what platform you are using but you might be able to rely on your plaform to get this information. In the case of .NET, you can get this information through Textinfo.ListSeparator. For example this would give you the French one (result: a semicolon):
string listSeparator = new CultureInfo("fr-FR").TextInfo.ListSeparator;
I don't think there is one way to do it. White space separating the numbers would works just the same, or you could use a semicolon (';') to separate the numbers

How many chars can numeric EDIFACT data elements be long?

In EDIFACT there are numeric data elements, specified e.g. as format n..5 -- we want to store those fields in a database table (with alphanumeric fields, so we can check them). How long must the db-fields be, so we can for sure store every possible valid value? I know it's at least two additional chars (for decimal point (or comma or whatever) and possibly a leading minus sign).
We are building our tables after the UN/EDIFACT standard we use in our message, not the specific guide involved, so we want to be able to store everything matching that standard. But documentation on the numeric data elements isn't really straightforward (or at least I could not find that part).
Thanks for any help
I finally found the information on the UNECE web site in the documentation on UN/EDIFACT rules Part 4. UN/EDIFACT rules Chapter 2.2 Syntax Rules . They don't say it directly, but when you put all the parts together, you get it. See TOC-entry 10: REPRESENTATION OF NUMERIC DATA ELEMENT VALUES.
Here's what it basically says:
10.1: Decimal Mark
Decimal mark must be transmitted (if needed) as specified in UNA (comma or point, put always one character). It shall not be counted as a character of the value when computing the maximum field length of a data element.
10.2: Triad Seperator
Triad separators shall not be used in interchange.
10.3: Sign
[...] If a value is to be indicated to be negative, it shall in transmission be immediately preceded by a minus sign e.g. -112. The minus sign shall not be counted as a character of the value when computing the maximum field length of a data element. However, allowance has to be made for the character in transmission and reception.
To put it together:
Other than the digits themselves there are only two (optional) chars allowed in a numeric field: the decimal seperator and a minus sign (no blanks are permitted in between any of the characters). These two extra chars are not counted against the maximum length of the value in the field.
So the maximum number of characters in a numeric field is the maximal length of the numeric field plus 2. If you want your database to be able to store every syntactically correct value transmitted in a field specified as n..17, your column would have to be 19 chars long (something like varchar(19)). Every EDIFACT-message that has a value longer than 19 chars in a field specified as n..17 does not need to be stored in the DB for semantic checking, because it is already syntactically wrong and can be rejected.
I used EDI Notepad from Liaison to solve a similar challenge. https://liaison.com/products/integrate/edi/edi-notepad
I recommend anyone looking at EDI to at least get their free (express) version of EDI Notepad.
The "high end" version (EDI Notepad Productivity Suite) of their product comes with a "Dictionary Viewer" tool that you can export the min / max lengths of the elements, as well as type. You can export the document to HTML from the Viewer tool. It would also handle ANSI X12 too.

Is it possible to create a mask to handle non-north american phone numbers?

For north american phone numbers, (999) 999-9999 works pretty well for an input mask.
However, I can't find a good example that will handle non-north american numbers. I know that the number of digits can vary, so other than restricting it to digits only, is there a good example anywhere?
There is no generic mask, really: There are too many combinations.
The only thing that is fixed is the international country code, usually prefixed by +.
According to the Wikipedia Article on telephone numbering plans, most countries conform with the E.164 numbering plan.
If I read E.164 correctly, you can safely make the following assumptions:
Country code: 1-3 digits
Network / Area code and Number: Up to 19 digits
I would ask for the country code, and have the "area code + number" field as a 19-digit input.
You can deduce the country code with a simple RegEx such as:
^(?:(?:0(?:0|11)\s?)|+)([17]|2([07]|[1-689]\d)|3([0-469]|[578]\d)|4([013-9]|2\d)|5([1-8]|[09]\d)|6([0-6]|[789]\d)|8([12469]|[03578]\d)|9([0-58]|[679]\d))
Followed by
(([\s\(\).-]{0,2}\d){4,13})$
to extract the national number.
For validating the national number length and validity, you'd need libphonenumber or similar.
The long RegEx above allows +, 00 or 011 before the country code and a selection of punctuation in the number which will also have to be stripped.
You don't mention your application but this is certainly possible using regular expressions. You might want to take a look here.
Not easily. Take a look at this page for an example why: if you only look at the German phone numbers, you'll note that there are different formats depending on where you're calling the number from. Which one do you pick? And that's just for German phone numbers; they differ from continent to continent, and from country to country.
Going with "numbers-only" is probably your safest bet.
I would allow for spaces, dashes, slashes and all that, but actually only care for numbers and the optional leading + sign. Everything else, such as assuming certain blocks of a certain length is just asking for trouble.
May be it is bad to answer an old question. But libphonenumber seems like a good solution to your question.

Do numbers need to be localized?

This seems like a stupid question. Is the number "10" refered to "10" in Hebrew, Arabic, and all languages? I'm not seeing anywhere where it says you need to do anything special with numbers when dealing with localization. Maybe number format but what about the number itself? I would think that numbers would read differently in right-to-left languages but translate.google.com is giving me the same number back. Can anyone confirm this?
Arabic and Japanese (?) do have different glyphs for numbers, but the standard system is so commonplace, that usually numbers are not converted.
If you're using the .NET formatting functions, then the numbers will be formatted according to the system preferences (I'm talking commas and decimal points here)
Different languages can use different digit sigils;
Number representation is different. eg 1,234.56 in English is represented as 1.234'56 in German.
So the answer is yes.
The digits 0-9 usually don't require any localization, except minor tweaks like AndreyT said, but those are more "fonts" related than anything.
The only important thing to take into account is large number representation.
For example, take 1mio$
In Switzerland, it will be:
$1'000'000.-
in US
$1,000,000
In Japan it will be
$100万
I don't know other place, but you got the idea.
For Japan, it's very uncommon to see numbers greater than 10'000 without using a kanji.
But I think you should see with the person doing the localization.
For the actual numbers themselves (and not floating poing, thousands seperator, etc) there are in fact differences between languages.
Hebrew numerals actually use the Hebrew letters as a number system, though it is used only for "traditional" numbers, such as the year in the Jewish calendar, the chapter, verse and page numbers in the Hebrew Bible, in lists (similar to using roman numerals instead of numbers), etc. But for all other cases, Hindu-Arabic numerals are used (e.g. 1, 2, 3, 4...) and are written left-to-right, even while the rest of the Hebrew text is written right-to-left (i.e. NML KJIHG 123 FEDC BA).
In Arabic, most countries use the Arabic-Indic numerals, but the Hindu-Arabic numerals are also understood.
In any case, .NET localization should take care of all conversions and display issues, and there's nothing special you need to do unless you render your own GUI.
There quite a few tings that can be localized in numbers. For example, in USA the fractional part of a number (if it has a fractional part) is separated by a dot, while in Russia a comma is normally used. In USA commas would be used to separate three-digit groups in the number, while in Russia it is not customary to separate them at all, or space is used for that purpose (or maybe some other character, but not a comma). And so on (although most of the formatting options apply to monetary quantities).
Even the preferred way to write characters themselves can depend on locale. In USA the character for '7' is usually written in two strokes, while in Europe it quite often has a third stroke - a short horizontal line through the middle. This, of course, is less important, since the two-stroke version is still recognized everywhere.
If you are displaying the numbers for math purposes (for example, showing 5 + 3 = 8), then use the standard digits 0-9. These are used nearly universally in mathematics.
If you are displaying something that is highly localized
(i.e. pricing on a street vendor's point-of-sale system in Saudi Arabia), there are a handful of countries that use different digits that are localized to their respective languages.
Most regions of people in the world will be fine with understanding 0-9 though.
I found this website to be a good starting guide: https://phrase.com/blog/posts/number-localization/
Some examples:
Bengali, for example, uses the Bengali–Assamese numeral system, whose
digits differ from the Western Arabic system: ০, ১, ২, ৩, ৪, ৫, ৬, ৭,
৮, ৯.
In some locales like Saudi Arabia, for example, it’s common to
represent numbers in the local numeral system, Eastern Arabic, and not
the Western Arabic system.
Keep in mind that we are just talking about digits here. When it comes to fractions (/), decimals (.), percentages (%), large number separators (,), number symbols (#), etc. most regions have specific rules and that's a whole other topic. They are not universal.

Resources