How to spell numbers in TCL? - localization

I thought this might be possible in TCL with format, but I'm not seeing how. My requirement is a spell_number function, i.e. given the number 456,789, output the string four hundred and fifty six thousand, seven hundred and eighty nine.
Ideally I'd like to be able to pass this function a locale, and have the output localised, but English will do for now!
Any suggestions?

Related

Why is this Google Sheets Concatenate Formula giving me weird results?

I'm trying to use Google Sheets to concatenate a bit of data. It works 90% of the time, however on certain numbers, I get an odd result. I have to copy the result of this data and paste it into a financial program in a specific format and am using the concatenate formula to do this. The format the program requires is that each field be separated by one period, even if it is a dollar amount as the program will automatically move the decimal point two places to the left while it is evaluating the information. The issue is that on some numbers the formula adds two periods between the fields, which stops the evaluation of the data in our financial program.
Here is a screenshot including the formula
You can see that it works with most numbers in the amount column, but with two of the amounts and several others, it adds two periods after the amount.
Would you please take a look at this and see if you can help me find the issue?
Thank you!!!!
Looks like it's an existing floating point calculation error in Google Sheets, the multiplication by 100 did not return exact value for certain numbers but with extra very small decimal. That's why there's an additional period on the result.
As a workaround, use ROUND() upon multiplying by 100 to "snap" it to an integer.
Sample:
References:
Floating Point Calculation Error
use just:
=B2&"."&ROUND(C2*100)&"."&D2

Find a time in some text, allowing for multiple formats

I have the following formula.
=INDEX(Lookups!$L$1:$L$726,MAX(IF(ISERROR(FIND(Lookups!$L$1:$L$726,$A1)),-1,1)*(ROW(Lookups!$L$1:$L$726)-ROW(Lookups!$L$1)+1)))
The idea is to pick up the time for a certain item from an email (already parsed into google sheets). The emails come in various formats so I'm unable to specify the location in the the text string to look at specifically.
The times are not always written in a conventional time format either so as you can see from the formula there are 726 possibilities that I work with. For example, sometimes the time could be written as 13:15 and others as 1:15 or even 1.15 or 1-15 etc etc.
The issue I have is that the above formula seems to start with the smallest string possible and work 'upwards', therefore picking up 3:15 from the email string rather than the full time string which is 13:15. Is there a way I can amend the formula to search for the longest string first, in that example looking for 13:15 and then only searching for 3:15 if the prior is not found.
Hope that makes sense. Thanks in advance for any assistance.
One way is to reorder those 726 possibilities so that you have the longer ones first. You can do it by creating another column with =len(L1), copying that formula down, and sorting the range by this new column in descending order.
But it would be easier to use regexextract instead, because regular expressions are designed to solve the problem you are facing. For example,
=regexextract(L1, "\b\d{1,2}[:.-]\d{1,2}\b")
picks up all of the variants 1:15, 13:15, 1-15 or 13.15. (It looks for the following sequence: word boundary, 1-2 digits, one of characters :, ., -, then 1-2 digits, and another word boundary.) The match is greedy, so it will find 13:15 when it's there, not just 3:15.
A more complex form
=regexextract(L1, "(?i)\b\d{1,2}[:.-]\d{1,2} ?(?:am|pm)?\b")
also supports "am" or "pm" after the time, case-insensitive and possibly separated by a space from the digits.
This can be refined further, for example the hours part would be more precisely stated as [0-2]?\d instead of \d{1,2}, and the minutes part as [0-6]?\d.

Internationalization of number abbreviations/truncation

I have a section of code that abbreviates large numbers to shorter forms, truncating the less significant digits.
Example:
3732 -> 3.7k
432761 -> 432k
3786532 -> 3.8m
However, now I am looking to find a way of doing the same thing in non-English-speaking locales, where the character used to denote thousand and million might be different, and billion might refer to a different number entirely (long scale). Is there a simple way to do this?

Why is it best to store a telephone number as a string vs. integer?

As the question states, why is it considered best practice to store telephone numbers as strings rather than integers in the telephone_number column?
Not sure I understand the rationale for this. Please help clear this up!
Thanks!
Telephone numbers are strings of digit characters, they are not integers.
Consider for example:
Expressing a telephone number in a different base would render it meaningless
Adding or multiplying two telephone numbers together, or any math operation on a phone number, is meaningless. The result is not another telephone number (except by conicidence)
Telephone numbers are intended to be entered "as-is" into a connected device.
Telephone numbers may have leading zeroes.
Manipulations of telephone numbers, such as adding an area code, are String operations.
Storing the string version of the telephone number makes this clear and unambiguous.
History: On old pulse-encoded dial systems, the code for each digit in a telephone number was sent as the same number of pulses as the digit (or 10 pulses for "0"). That may be why we still use digits to represent the parts of a phone number. See http://en.wikipedia.org/wiki/Pulse_dialing
What Neil Slater said is correct. I would add that there are lots of edge cases where you can't express a telephone number as a number value consistently.
For example, consider these numbers:
011-123-555-1212
+11-123-555-1212
+1 (112) 355-5121 x2
These are all potentially valid phone numbers, but they mean very different things. Yet, in integer form, they are all 111235551212.
If you are going to store the number for display from input, then you must use a string.
However, while it is true that no mathematical operations can be performed on a number that have meaning. Using a number in hashsets and for indexing is quicker than using a string. So provided you can guarantee or homogenise your set of numbers, so they are all consistent, then you may see better performance operating on a number.
For example, in the Telco world, rating calls for a given customer includes a lot of searching on their CLI and in this situation it is faster and cheaper to search by integer. Generally though strings will be fine performance wise, it is only where performance matters and you have multiple searches to perform for a huge range of numbers - i.e. Rating 250 million calls across 2 million lines and 2000 tariffs. In memory rating also gets expensive, so being able to use a 64bit int or uint is cheaper when dealing with these volumes.
Consider these phone numbers for example
099-1234-56789 or +91-8907-687665.
In this case,if the phone_number attribute is of type integer,then it can't accept these values.It should be a string to hold these type of values.So string is always preferred than integer
There is several reasons for this :
Phone numbers often start with a "0" : an integer will remove all leading "0"s
Phone number can have special char : +, (, -, etc. (for exemple : +33 (0)6 12 23 34)
You cannot perform operations on phones : adding phones, for instance, would be meaningless
Phone number may be internationalised, i.e. different format for different people, thus not possible with integers
There might be other reasons, but I guess that's already a fair amount of those :)

Do numbers need to be localized?

This seems like a stupid question. Is the number "10" refered to "10" in Hebrew, Arabic, and all languages? I'm not seeing anywhere where it says you need to do anything special with numbers when dealing with localization. Maybe number format but what about the number itself? I would think that numbers would read differently in right-to-left languages but translate.google.com is giving me the same number back. Can anyone confirm this?
Arabic and Japanese (?) do have different glyphs for numbers, but the standard system is so commonplace, that usually numbers are not converted.
If you're using the .NET formatting functions, then the numbers will be formatted according to the system preferences (I'm talking commas and decimal points here)
Different languages can use different digit sigils;
Number representation is different. eg 1,234.56 in English is represented as 1.234'56 in German.
So the answer is yes.
The digits 0-9 usually don't require any localization, except minor tweaks like AndreyT said, but those are more "fonts" related than anything.
The only important thing to take into account is large number representation.
For example, take 1mio$
In Switzerland, it will be:
$1'000'000.-
in US
$1,000,000
In Japan it will be
$100万
I don't know other place, but you got the idea.
For Japan, it's very uncommon to see numbers greater than 10'000 without using a kanji.
But I think you should see with the person doing the localization.
For the actual numbers themselves (and not floating poing, thousands seperator, etc) there are in fact differences between languages.
Hebrew numerals actually use the Hebrew letters as a number system, though it is used only for "traditional" numbers, such as the year in the Jewish calendar, the chapter, verse and page numbers in the Hebrew Bible, in lists (similar to using roman numerals instead of numbers), etc. But for all other cases, Hindu-Arabic numerals are used (e.g. 1, 2, 3, 4...) and are written left-to-right, even while the rest of the Hebrew text is written right-to-left (i.e. NML KJIHG 123 FEDC BA).
In Arabic, most countries use the Arabic-Indic numerals, but the Hindu-Arabic numerals are also understood.
In any case, .NET localization should take care of all conversions and display issues, and there's nothing special you need to do unless you render your own GUI.
There quite a few tings that can be localized in numbers. For example, in USA the fractional part of a number (if it has a fractional part) is separated by a dot, while in Russia a comma is normally used. In USA commas would be used to separate three-digit groups in the number, while in Russia it is not customary to separate them at all, or space is used for that purpose (or maybe some other character, but not a comma). And so on (although most of the formatting options apply to monetary quantities).
Even the preferred way to write characters themselves can depend on locale. In USA the character for '7' is usually written in two strokes, while in Europe it quite often has a third stroke - a short horizontal line through the middle. This, of course, is less important, since the two-stroke version is still recognized everywhere.
If you are displaying the numbers for math purposes (for example, showing 5 + 3 = 8), then use the standard digits 0-9. These are used nearly universally in mathematics.
If you are displaying something that is highly localized
(i.e. pricing on a street vendor's point-of-sale system in Saudi Arabia), there are a handful of countries that use different digits that are localized to their respective languages.
Most regions of people in the world will be fine with understanding 0-9 though.
I found this website to be a good starting guide: https://phrase.com/blog/posts/number-localization/
Some examples:
Bengali, for example, uses the Bengali–Assamese numeral system, whose
digits differ from the Western Arabic system: ০, ১, ২, ৩, ৪, ৫, ৬, ৭,
৮, ৯.
In some locales like Saudi Arabia, for example, it’s common to
represent numbers in the local numeral system, Eastern Arabic, and not
the Western Arabic system.
Keep in mind that we are just talking about digits here. When it comes to fractions (/), decimals (.), percentages (%), large number separators (,), number symbols (#), etc. most regions have specific rules and that's a whole other topic. They are not universal.

Resources