Is it possible to create a mask to handle non-north american phone numbers? - phone-number

For north american phone numbers, (999) 999-9999 works pretty well for an input mask.
However, I can't find a good example that will handle non-north american numbers. I know that the number of digits can vary, so other than restricting it to digits only, is there a good example anywhere?

There is no generic mask, really: There are too many combinations.
The only thing that is fixed is the international country code, usually prefixed by +.
According to the Wikipedia Article on telephone numbering plans, most countries conform with the E.164 numbering plan.
If I read E.164 correctly, you can safely make the following assumptions:
Country code: 1-3 digits
Network / Area code and Number: Up to 19 digits
I would ask for the country code, and have the "area code + number" field as a 19-digit input.

You can deduce the country code with a simple RegEx such as:
^(?:(?:0(?:0|11)\s?)|+)([17]|2([07]|[1-689]\d)|3([0-469]|[578]\d)|4([013-9]|2\d)|5([1-8]|[09]\d)|6([0-6]|[789]\d)|8([12469]|[03578]\d)|9([0-58]|[679]\d))
Followed by
(([\s\(\).-]{0,2}\d){4,13})$
to extract the national number.
For validating the national number length and validity, you'd need libphonenumber or similar.
The long RegEx above allows +, 00 or 011 before the country code and a selection of punctuation in the number which will also have to be stripped.

You don't mention your application but this is certainly possible using regular expressions. You might want to take a look here.

Not easily. Take a look at this page for an example why: if you only look at the German phone numbers, you'll note that there are different formats depending on where you're calling the number from. Which one do you pick? And that's just for German phone numbers; they differ from continent to continent, and from country to country.
Going with "numbers-only" is probably your safest bet.

I would allow for spaces, dashes, slashes and all that, but actually only care for numbers and the optional leading + sign. Everything else, such as assuming certain blocks of a certain length is just asking for trouble.

May be it is bad to answer an old question. But libphonenumber seems like a good solution to your question.

Related

Need Code for US Phone numbers using Dashes or Parentheses but not spaces

I am new to RegEx and need some guidance. Right now I have the following validation for phone numbers:
(\d{3}) ?\d{3}( |-)?\d{4}|^\d{3}( |-)?\d{3}( |-)?\d{4}
Unfortunately, the system I am importing the results into does not think favorably of the numbers being separated solely by spaces or not separated at all. What would the formula look like that requires either dashes or parentheses and accepts only the following formats: XXX-XXX-XXXX or (XXX) XXX-XXXX?
Thank you for your assistance.
Start simple:
\d{3}-\d{3}-\d{4} works beautifully for numbers like 212-867-5309.
As for others, I'd say you and your users would be better off if you kept it simple. No switching, no choices. Pick a standard. Simple is good.
If you must persist, look at this web site for help. You aren't the first.

Why is it best to store a telephone number as a string vs. integer?

As the question states, why is it considered best practice to store telephone numbers as strings rather than integers in the telephone_number column?
Not sure I understand the rationale for this. Please help clear this up!
Thanks!
Telephone numbers are strings of digit characters, they are not integers.
Consider for example:
Expressing a telephone number in a different base would render it meaningless
Adding or multiplying two telephone numbers together, or any math operation on a phone number, is meaningless. The result is not another telephone number (except by conicidence)
Telephone numbers are intended to be entered "as-is" into a connected device.
Telephone numbers may have leading zeroes.
Manipulations of telephone numbers, such as adding an area code, are String operations.
Storing the string version of the telephone number makes this clear and unambiguous.
History: On old pulse-encoded dial systems, the code for each digit in a telephone number was sent as the same number of pulses as the digit (or 10 pulses for "0"). That may be why we still use digits to represent the parts of a phone number. See http://en.wikipedia.org/wiki/Pulse_dialing
What Neil Slater said is correct. I would add that there are lots of edge cases where you can't express a telephone number as a number value consistently.
For example, consider these numbers:
011-123-555-1212
+11-123-555-1212
+1 (112) 355-5121 x2
These are all potentially valid phone numbers, but they mean very different things. Yet, in integer form, they are all 111235551212.
If you are going to store the number for display from input, then you must use a string.
However, while it is true that no mathematical operations can be performed on a number that have meaning. Using a number in hashsets and for indexing is quicker than using a string. So provided you can guarantee or homogenise your set of numbers, so they are all consistent, then you may see better performance operating on a number.
For example, in the Telco world, rating calls for a given customer includes a lot of searching on their CLI and in this situation it is faster and cheaper to search by integer. Generally though strings will be fine performance wise, it is only where performance matters and you have multiple searches to perform for a huge range of numbers - i.e. Rating 250 million calls across 2 million lines and 2000 tariffs. In memory rating also gets expensive, so being able to use a 64bit int or uint is cheaper when dealing with these volumes.
Consider these phone numbers for example
099-1234-56789 or +91-8907-687665.
In this case,if the phone_number attribute is of type integer,then it can't accept these values.It should be a string to hold these type of values.So string is always preferred than integer
There is several reasons for this :
Phone numbers often start with a "0" : an integer will remove all leading "0"s
Phone number can have special char : +, (, -, etc. (for exemple : +33 (0)6 12 23 34)
You cannot perform operations on phones : adding phones, for instance, would be meaningless
Phone number may be internationalised, i.e. different format for different people, thus not possible with integers
There might be other reasons, but I guess that's already a fair amount of those :)

Regexp for a name

I need to make sure people enter their first, middle and last names correctly for a form in Rails. So the first thought for a regular expression is:
\A[[:upper:]][[:alpha:]'-]+( [[:upper:]][[:alpha:]'-]*)*\z
That'll make sure every word in the name starts with an uppercase letter followed by a letter or hyphen or apostrophe.
My first question I guess doesn't have much to do with regular expressions, though I'm hoping there's a regular expression I can copy for this. Are letters, hyphens and apostrophes the only characters I should be checking in a name?
My second question is if it's important to make sure each name has at least 1 uppercase letter? So many people enter all lowercase names and I really want to avoid that, but is it sometimes legitimate?
Here's what I have so far that makes sure there's at least 1 uppercase letter somewhere in the name:
\A([[:alpha:]'-]+ )*[[:alpha:]'-]*[[:upper:]][[:alpha:]'-]*( [[:alpha:]'-]+)*\z
Isn't there a [:name:] bracket expression? :)
UPDATE: I added . and , to the characters allowed, surprised I didn't think of them originally. So many people must have to deal with this kind of regular expression! Nobody has any pre-made regular expressions for this sort of thing?
A good start would be to allow letters, marks, punctiation and whitespace. To allow for a given name like "María-Jose" and a last name like "van Rossum" (note the whitespace).
So that boils down to something like:
[\p{Letter}\p{Mark}\p{Punctuation}\p{Separator}]+
If you want to restrict that a bit you could have a look at classes like \p{Lowercase_Letter}, \p{Uppercase_Letter}, \p{Titlecase_Letter}, but there may be scripts that don't have casing. \p{Space_Separator} and \p{Dash_Punctuation} can narrow it down to names that I know. But names I don't...I don't know...
But before you start constructing your regex for "validating" a name. Please read this excellent piece on names by W3C. It will shake even your concepts of first, middle and last names.
For example:
In some cultures you are given a name (Björk, Osama) and an indication of who your father (or mother) was (Guðmundsdóttir, bin Mohammed). So the "first name" could be "Björk" but:
Björk wouldn’t normally expect to be called Ms. Guðmundsdóttir. Telephone directories in Iceland are sorted by given name.
But in other cultures, the first name is not given, but a family name. In "Zhāng Mànyù", "Zhāng" is the family name. And how to address her, would depend how well you know her, but again "Ms. Zhāng" would be strange.
The list of examples goes on and ends in a some 30+ links to Wikipedia for more examples.
The article does end with suggestions for field design and some pointers on what characters to allow:
Don't forget to allow people to use punctuation such as hyphens, apostrophes, etc. in names. Don't require names to be entered all in upper case – this can be difficult on a mobile device. Allow the user to enter a name with spaces , eg. to support prefixes and suffixes such as de in French, von in German, and Jnr/Jr in American names, and also because some people consider a space-separated sequence of characters to be a single name, eg. Rose Marie.
To answer your question about capital letters: in many areas of the world, names do not necessarily start with a capital letter. In Dutch for instance, you have surnames like "van der Vliet" where words like "van", "de", "den" and "der" are not capitalised. Additionally, you have special cases like "De fauw" and "Van pellicom" where an administrative error never got rectified, and the correct capitalisation is fairly illogical. Please do not make the mistake of rejecting such names.
I also know about town names in South Africa such as eThekwini, where the capital letter is not necessarily the first letter of the word. This could very well appear in surnames or given names as well.

How do Europeans write a list of numbers with decimals?

As I understand it, Europeans(*) write numbers with a comma for a decimal separator, so one-and-a-quarter is written as 1,25
Europeans also use commas to separate lists, so how do you write a list of decimal numbers? I, as an Englishman, would write one-and-a-quarter, one-and-a-half, one-and-three-quarters like this:
1.25, 1.5, 1.75
How do you do that in Europe?
(Why is this a programming question? Because I'm writing a program that will ask European users for a list of numbers!)
* For the purposes of this question, there are no English-speaking countries in Europe. :-)
I'm European (french), and in almost all programs here we have to use semicolons ';' as a separator, even if the numbers are only integers because the comma doesn't look like a separator for us. In mathematics, semicolons are the only right way here to separate a list of numbers.
The most common example is when we have to enter the page numbers we want to print on a PDF, all programs ask for a semicolon-separated list and I clearly found it intuitive. I think they would have changed it if it was uncomfortable for some.
This varies by culture, and within a culture. The CLDR data contains the “list” element that specifies the list separator character, and it is the semicolon for most cultures, see the chart of number symbols (element “list”). The definition is very implicit though, and there is variation inside locales. Some people regard 1,25, 1,5, 1,75 as acceptable, while others prefer 1,25; 1,5; 1,75. There are also people who seriously think that in a strongly mathematical or numeric context, one should deviate from the locale practices and use the Anglo-Saxon notation with decimal point, hence with comma as separator.
On the practical side, I think it would not be very wrong to use ”;” as number list separator when decimal comma is used, or even when decimal point is used. So you might even consider using ”;” in all locales.
But when it comes to user input, it’s trickier. In principle, you be liberal in what you accept, but since the comma can be meant to be a decimal comma, a thousands separator, or a list item separator, there is such a thing as being too liberal.
If possible, prompt for each number separately, avoiding the separator issue. If this is not possible, the crucial thing is to make it very, very clear to the use which separator is expected. I would go as far as saying that requiring for the semicolon ”;” is the most reliable thing to do.
Why ask about Europeans in general ? I don't think there is one European way of doing so, and if it happens to be the case then it would be sheer luck. Europe is comprised of different cultures and each has its own rules.
You don't mention what platform you are using but you might be able to rely on your plaform to get this information. In the case of .NET, you can get this information through Textinfo.ListSeparator. For example this would give you the French one (result: a semicolon):
string listSeparator = new CultureInfo("fr-FR").TextInfo.ListSeparator;
I don't think there is one way to do it. White space separating the numbers would works just the same, or you could use a semicolon (';') to separate the numbers

Do numbers need to be localized?

This seems like a stupid question. Is the number "10" refered to "10" in Hebrew, Arabic, and all languages? I'm not seeing anywhere where it says you need to do anything special with numbers when dealing with localization. Maybe number format but what about the number itself? I would think that numbers would read differently in right-to-left languages but translate.google.com is giving me the same number back. Can anyone confirm this?
Arabic and Japanese (?) do have different glyphs for numbers, but the standard system is so commonplace, that usually numbers are not converted.
If you're using the .NET formatting functions, then the numbers will be formatted according to the system preferences (I'm talking commas and decimal points here)
Different languages can use different digit sigils;
Number representation is different. eg 1,234.56 in English is represented as 1.234'56 in German.
So the answer is yes.
The digits 0-9 usually don't require any localization, except minor tweaks like AndreyT said, but those are more "fonts" related than anything.
The only important thing to take into account is large number representation.
For example, take 1mio$
In Switzerland, it will be:
$1'000'000.-
in US
$1,000,000
In Japan it will be
$100万
I don't know other place, but you got the idea.
For Japan, it's very uncommon to see numbers greater than 10'000 without using a kanji.
But I think you should see with the person doing the localization.
For the actual numbers themselves (and not floating poing, thousands seperator, etc) there are in fact differences between languages.
Hebrew numerals actually use the Hebrew letters as a number system, though it is used only for "traditional" numbers, such as the year in the Jewish calendar, the chapter, verse and page numbers in the Hebrew Bible, in lists (similar to using roman numerals instead of numbers), etc. But for all other cases, Hindu-Arabic numerals are used (e.g. 1, 2, 3, 4...) and are written left-to-right, even while the rest of the Hebrew text is written right-to-left (i.e. NML KJIHG 123 FEDC BA).
In Arabic, most countries use the Arabic-Indic numerals, but the Hindu-Arabic numerals are also understood.
In any case, .NET localization should take care of all conversions and display issues, and there's nothing special you need to do unless you render your own GUI.
There quite a few tings that can be localized in numbers. For example, in USA the fractional part of a number (if it has a fractional part) is separated by a dot, while in Russia a comma is normally used. In USA commas would be used to separate three-digit groups in the number, while in Russia it is not customary to separate them at all, or space is used for that purpose (or maybe some other character, but not a comma). And so on (although most of the formatting options apply to monetary quantities).
Even the preferred way to write characters themselves can depend on locale. In USA the character for '7' is usually written in two strokes, while in Europe it quite often has a third stroke - a short horizontal line through the middle. This, of course, is less important, since the two-stroke version is still recognized everywhere.
If you are displaying the numbers for math purposes (for example, showing 5 + 3 = 8), then use the standard digits 0-9. These are used nearly universally in mathematics.
If you are displaying something that is highly localized
(i.e. pricing on a street vendor's point-of-sale system in Saudi Arabia), there are a handful of countries that use different digits that are localized to their respective languages.
Most regions of people in the world will be fine with understanding 0-9 though.
I found this website to be a good starting guide: https://phrase.com/blog/posts/number-localization/
Some examples:
Bengali, for example, uses the Bengali–Assamese numeral system, whose
digits differ from the Western Arabic system: ০, ১, ২, ৩, ৪, ৫, ৬, ৭,
৮, ৯.
In some locales like Saudi Arabia, for example, it’s common to
represent numbers in the local numeral system, Eastern Arabic, and not
the Western Arabic system.
Keep in mind that we are just talking about digits here. When it comes to fractions (/), decimals (.), percentages (%), large number separators (,), number symbols (#), etc. most regions have specific rules and that's a whole other topic. They are not universal.

Resources