Convert spreadsheet decimal number format: period- and comma-separated decimal positions - google-sheets

In OpenOffice Calc, one can change number format between standard United States, with decimal separator being a period ., and the SI (International System) format with decimal separator being a comma ,. This can be done by setting the language to US-English in the first case, and to Canada-French in the latter case (in the Format menu).
13.3 13,3
2.2 2,2
How can the same be done in Google Spreadsheets?
It does not seem that language can be changed within a same sheet (or maybe a same Google account ...?).
Is there another way?
Copy-paste from OO to Google does not work:
13/03/2014 13,3
02/02/2014 2,2

I dispute that comma as the decimal separator is " the SI (International System) format " - rather than one of ;-) - but think what you want is:
File > Spreadsheet settings ... and for Locale: choose somewhere like Canada (French).
Though beware, this could have ramifications beyond , for ..

The 22nd General Conference on Weights and Measures declared in 2003 that "the symbol for the decimal marker shall be either the point on the line or the comma on the line". It further reaffirmed that "numbers may be divided in groups of three in order to facilitate reading; neither dots nor commas are ever inserted in the spaces between groups"[20] (e.g. 1000000000). This usage has therefore been recommended by technical organizations, such as the United States' National Institute of Standards and Technology.[21]
ISO-8601 also stipulates normative notation based on SI conventions, adding that the comma is preferred over the full stop.

Related

Why are hyphens screwing up my duplicate finding conditional formatting?

I have a sheet I use as a database of scientific papers. I copy journal article titles from different sources (some could be from an email, others are links on a web page, or just the title from the article page). I have conditional formatting set to let me know if I'm adding a title that is already in the list. I've noticed that there are some titles that are "ignoring" the conditional formatting, and it looks like there are hyphens in all of the offenders. If I remove the hyphens, the conditional formatting works. So there is some 'difference' in the hyphens originating from the same title that is preventing the conditional formatting from viewing them as identical.
Shared sheet
Examples of offending titles:
End-to-end continuous bioprocessing: impact on facility design, cost of goods and cost of development for monoclonal antibodies
End‐to‐end continuous bioprocessing: impact on facility design, cost of goods and cost of development for monoclonal antibodies
End‐to‐end continuous bioprocessing: Impact on facility design, cost of goods, and cost of development for monoclonal antibodies
What is this difference, and is there a way to fix it? Do I need to write a script to find/replace the hyphens to get this to work?
TIA
Just because characters appear identical, it does not mean that they are identical. You have fallen foul of the similarity between the hyphen and dashes. Visually, they are almost identical - dashes are slightly widest than the hyphen.
Dashes are regarded as "special characters" (i.e. they aren't keys on the keyboard) but they are used widely in html. So if, for instance, you copied an item from a website then you might unwittingly have copied dashes rather than hyphens.
You can identify the exact nature of a character by using the CODE function.
You ask "What is this difference, and is there a way to fix it? Do I need to write a script to find/replace the hyphens to get this to work?"
WHAT IS THIS DIFFERENCE?
It's important to recognise that though these examples appear identical, there are other differences that are more than just hyphens vs dashes.
Example#1 - Hyphen - CODE returns "45"
Example#2 - Dash - CODE returns "8208"
Example#3 - Dash - CODE returns "8208".
But there are other factors that contribute to fail to trigger the conditional formatting rule:
Length = 128 (vs 127 for the other examples). There is an additional comma (after "cost of sales")
the word "Impact" is spelled with an upper case "I" (lower case for the other examples)
MOVING FORWARD
Do you need a script? No (IMHO)
Is there a way to fix it? As outlined above, there are more differences that just hyphens and dashes. And, as time goes by, the number & type of difference might increase. However, there is a solution to the "Hyphen Vs Dash" problem which is the focus of this question.
FORMULA AND FORMATTING
Your data is currently in Column A and Column A is also subject to conditional formatting.
Remove the conditional formatting rules from Column A
Insert this formula in cell B2
=arrayformula(if(LEN($A2:A)-LEN(SUBSTITUTE($A2:A, char(8208), ""))=0,A2:A,arrayformula(substitute(A2:A,char(8208),char(45)))))
Conditional Formatting for Column B
select the range in Column B
select, Format, Conditional Formatting.
Select "Custom Formula is" and enter this formula: =countif($B$2:$B2,B2)>1
Select a preferred Formatting Style and then click Done.
FORMULA LOGIC
arrayformula enables the formula to automatically populate all the relevant cell in the column.
LEN($A2:A)-LEN(SUBSTITUTE($A2:A, char(8208), ""))=0
a test for dashes in a string. It substitutes a nil value for any/all instances of a dash (char(8208)), then compares the length to the adjusted length. If the value is zero, then there are no dashes in the string.
IF: Test for any dashes,
if the string doesn't contain any dashes then use that value
else, the string must contains dashes so substitute any dashes for hyphens, and use the substituted value
arrayformula(substitute(A2:A,char(8208),char(45)))
The conditional formatting rule then looks for duplicate values in the column, and formats any/all duplicate values.
You'll note that Example#3 is not flagged as a duplicate despite containing dashes. This is because of the spelling of "Impact" and the extra comma after "cost of sales".
Sample

How many chars can numeric EDIFACT data elements be long?

In EDIFACT there are numeric data elements, specified e.g. as format n..5 -- we want to store those fields in a database table (with alphanumeric fields, so we can check them). How long must the db-fields be, so we can for sure store every possible valid value? I know it's at least two additional chars (for decimal point (or comma or whatever) and possibly a leading minus sign).
We are building our tables after the UN/EDIFACT standard we use in our message, not the specific guide involved, so we want to be able to store everything matching that standard. But documentation on the numeric data elements isn't really straightforward (or at least I could not find that part).
Thanks for any help
I finally found the information on the UNECE web site in the documentation on UN/EDIFACT rules Part 4. UN/EDIFACT rules Chapter 2.2 Syntax Rules . They don't say it directly, but when you put all the parts together, you get it. See TOC-entry 10: REPRESENTATION OF NUMERIC DATA ELEMENT VALUES.
Here's what it basically says:
10.1: Decimal Mark
Decimal mark must be transmitted (if needed) as specified in UNA (comma or point, put always one character). It shall not be counted as a character of the value when computing the maximum field length of a data element.
10.2: Triad Seperator
Triad separators shall not be used in interchange.
10.3: Sign
[...] If a value is to be indicated to be negative, it shall in transmission be immediately preceded by a minus sign e.g. -112. The minus sign shall not be counted as a character of the value when computing the maximum field length of a data element. However, allowance has to be made for the character in transmission and reception.
To put it together:
Other than the digits themselves there are only two (optional) chars allowed in a numeric field: the decimal seperator and a minus sign (no blanks are permitted in between any of the characters). These two extra chars are not counted against the maximum length of the value in the field.
So the maximum number of characters in a numeric field is the maximal length of the numeric field plus 2. If you want your database to be able to store every syntactically correct value transmitted in a field specified as n..17, your column would have to be 19 chars long (something like varchar(19)). Every EDIFACT-message that has a value longer than 19 chars in a field specified as n..17 does not need to be stored in the DB for semantic checking, because it is already syntactically wrong and can be rejected.
I used EDI Notepad from Liaison to solve a similar challenge. https://liaison.com/products/integrate/edi/edi-notepad
I recommend anyone looking at EDI to at least get their free (express) version of EDI Notepad.
The "high end" version (EDI Notepad Productivity Suite) of their product comes with a "Dictionary Viewer" tool that you can export the min / max lengths of the elements, as well as type. You can export the document to HTML from the Viewer tool. It would also handle ANSI X12 too.

Localization and Lists of Decimal Numbers

I'm working on localizing some strings in our application and we have text that looks something like:
Factor f (1.0, 1.2, or 1.5)
In a locale that uses a comma for the decimal point, would this be written as:
Factor f (1,0, 1,2, or 1,5)
Maybe it's just not what I'm accustomed to, but that looks crazy hard to read quickly.
I'm also wondering about text like version numbers. Would Firefox 3.5.1 be Firefox 3,5,1?
If I understand what you are looking for, there are two things in regards to Internationalization here:
Decimal separator
List separator
Obviously these separators are quite tightly coupled, so in Locale that uses comma as decimal separator, list separator must be something else. Usually this is a semicolon and there just a few Locales that uses something different than comma or semicolon for list separator.
To summarize:
In Locales that uses dot as a decimal separator, comma is usually used as a list separator, so in some free-form text you might expect something like Factor f (1.0, 1.2, or 1.5).
In Locales that uses comma as a decimal separator, semicolon is typically used as a list separator – Faktor f (1,0; 1,2; oder 1,5) is something you should expect.
I am not sure what you are up to (the technology does matter in the advice) but you can leave the format as well as list separator to the translators to decide. In .Net list separator is given, though (no need to ask translators for input, just use appropriate property of CultureInfo class).
Sorry to say, but I don't know about your first question. However, as far as version numbers go, they are generally left untranslated. End users typically attribute little meaning to the version's numeric value (they are infact NOT numeric in nature. 3.90 < 3.100). They are simply discrete numbers with a universally-accepted separator, and not natural numbers with natural "grouping/decimal" separators.
In addition to end-user experience with version numbers. Developers are often known to parse version numbers in the standard format of {major}.{minor}.{revision}, using . as the well-known seperator character.
I did find this link that talks about your first question (sort of). I don't know how authoritative or credible it is; but it doesn't look dubious.

Is it possible to create a mask to handle non-north american phone numbers?

For north american phone numbers, (999) 999-9999 works pretty well for an input mask.
However, I can't find a good example that will handle non-north american numbers. I know that the number of digits can vary, so other than restricting it to digits only, is there a good example anywhere?
There is no generic mask, really: There are too many combinations.
The only thing that is fixed is the international country code, usually prefixed by +.
According to the Wikipedia Article on telephone numbering plans, most countries conform with the E.164 numbering plan.
If I read E.164 correctly, you can safely make the following assumptions:
Country code: 1-3 digits
Network / Area code and Number: Up to 19 digits
I would ask for the country code, and have the "area code + number" field as a 19-digit input.
You can deduce the country code with a simple RegEx such as:
^(?:(?:0(?:0|11)\s?)|+)([17]|2([07]|[1-689]\d)|3([0-469]|[578]\d)|4([013-9]|2\d)|5([1-8]|[09]\d)|6([0-6]|[789]\d)|8([12469]|[03578]\d)|9([0-58]|[679]\d))
Followed by
(([\s\(\).-]{0,2}\d){4,13})$
to extract the national number.
For validating the national number length and validity, you'd need libphonenumber or similar.
The long RegEx above allows +, 00 or 011 before the country code and a selection of punctuation in the number which will also have to be stripped.
You don't mention your application but this is certainly possible using regular expressions. You might want to take a look here.
Not easily. Take a look at this page for an example why: if you only look at the German phone numbers, you'll note that there are different formats depending on where you're calling the number from. Which one do you pick? And that's just for German phone numbers; they differ from continent to continent, and from country to country.
Going with "numbers-only" is probably your safest bet.
I would allow for spaces, dashes, slashes and all that, but actually only care for numbers and the optional leading + sign. Everything else, such as assuming certain blocks of a certain length is just asking for trouble.
May be it is bad to answer an old question. But libphonenumber seems like a good solution to your question.

Do numbers need to be localized?

This seems like a stupid question. Is the number "10" refered to "10" in Hebrew, Arabic, and all languages? I'm not seeing anywhere where it says you need to do anything special with numbers when dealing with localization. Maybe number format but what about the number itself? I would think that numbers would read differently in right-to-left languages but translate.google.com is giving me the same number back. Can anyone confirm this?
Arabic and Japanese (?) do have different glyphs for numbers, but the standard system is so commonplace, that usually numbers are not converted.
If you're using the .NET formatting functions, then the numbers will be formatted according to the system preferences (I'm talking commas and decimal points here)
Different languages can use different digit sigils;
Number representation is different. eg 1,234.56 in English is represented as 1.234'56 in German.
So the answer is yes.
The digits 0-9 usually don't require any localization, except minor tweaks like AndreyT said, but those are more "fonts" related than anything.
The only important thing to take into account is large number representation.
For example, take 1mio$
In Switzerland, it will be:
$1'000'000.-
in US
$1,000,000
In Japan it will be
$100万
I don't know other place, but you got the idea.
For Japan, it's very uncommon to see numbers greater than 10'000 without using a kanji.
But I think you should see with the person doing the localization.
For the actual numbers themselves (and not floating poing, thousands seperator, etc) there are in fact differences between languages.
Hebrew numerals actually use the Hebrew letters as a number system, though it is used only for "traditional" numbers, such as the year in the Jewish calendar, the chapter, verse and page numbers in the Hebrew Bible, in lists (similar to using roman numerals instead of numbers), etc. But for all other cases, Hindu-Arabic numerals are used (e.g. 1, 2, 3, 4...) and are written left-to-right, even while the rest of the Hebrew text is written right-to-left (i.e. NML KJIHG 123 FEDC BA).
In Arabic, most countries use the Arabic-Indic numerals, but the Hindu-Arabic numerals are also understood.
In any case, .NET localization should take care of all conversions and display issues, and there's nothing special you need to do unless you render your own GUI.
There quite a few tings that can be localized in numbers. For example, in USA the fractional part of a number (if it has a fractional part) is separated by a dot, while in Russia a comma is normally used. In USA commas would be used to separate three-digit groups in the number, while in Russia it is not customary to separate them at all, or space is used for that purpose (or maybe some other character, but not a comma). And so on (although most of the formatting options apply to monetary quantities).
Even the preferred way to write characters themselves can depend on locale. In USA the character for '7' is usually written in two strokes, while in Europe it quite often has a third stroke - a short horizontal line through the middle. This, of course, is less important, since the two-stroke version is still recognized everywhere.
If you are displaying the numbers for math purposes (for example, showing 5 + 3 = 8), then use the standard digits 0-9. These are used nearly universally in mathematics.
If you are displaying something that is highly localized
(i.e. pricing on a street vendor's point-of-sale system in Saudi Arabia), there are a handful of countries that use different digits that are localized to their respective languages.
Most regions of people in the world will be fine with understanding 0-9 though.
I found this website to be a good starting guide: https://phrase.com/blog/posts/number-localization/
Some examples:
Bengali, for example, uses the Bengali–Assamese numeral system, whose
digits differ from the Western Arabic system: ০, ১, ২, ৩, ৪, ৫, ৬, ৭,
৮, ৯.
In some locales like Saudi Arabia, for example, it’s common to
represent numbers in the local numeral system, Eastern Arabic, and not
the Western Arabic system.
Keep in mind that we are just talking about digits here. When it comes to fractions (/), decimals (.), percentages (%), large number separators (,), number symbols (#), etc. most regions have specific rules and that's a whole other topic. They are not universal.

Resources