How to format a number to string (currency) with thousands and decimal separators using Erlang? - localization

I'm working on an application in Erlang that has to deal with number formatting for several different currencies. There are many complexities to take in consideration, like currency symbols, where it goes (right or left), etc. But one aspect of this kind of formatting use to be straight forward in other programming languages but it's being hard in Erlang, which is the separators to use for thousands and decimal places.
For most of the English speaking countries (independent of the currency) the expected number format is:
9,999,999.99
For several other countries, like Germany and Brazil, the number format is different:
9.999.999,99
There are other variations, like the French variation, with spaces:
9 999 999,99
The problem is that I can't find a nice way to achieve these formats, starting from the floating point number. Sure the io_lib:format/2 can convert it to a string, but it seems to not offer any control over the symbols to use as decimal separator and doesn't output any separator for the thousands (which makes a workaround to "search-and-replace" impossible).
This is an example of what we have so far:
%% currency_l10n.erl
-module(currency_l10n).
-export([format/2]).
number_format(us, Value) ->
io_lib:format("~.2f", [Value]);
number_format(de, Value) ->
io_lib:format("~B,~2..0B", [trunc(Value), trunc(Value*100) rem 100]).
%% Examples on the Erlang REPL:
1> currency_l10n:number_format(us, 9999.99).
"9999.99"
2> currency_l10n:number_format(de, 9999.99).
"9999,99"
As you can see, already the workaround for the decimal separator is not exactly pretty and dealing with the delimiters for thousands won't be nicer. Is there anything we are missing here?

The problem you have is not solve (AFAIK) by any standard library in erlang. It needs several actions to produce the expected result: convert the float to string, split the string in packets, insert 2 kinds of separator and insert the currency sign at the beginning or the end. You need different functions for these tasks. the following code is an example of what you could do:
-module (pcur).
-export ([printCurrency/2]).
% X = value to print, must be a float
% Country = an atom representing the country
printCurrency(X,Country) ->
% convert to string, split and get the different packets
[Dec|Num] = splitInReversePackets(toReverseString(X)),
% get convention for the country
{Before,Dot,Sep,After} = currencyConvention(Country),
% build the result - Beware of unicode!
Before ++ printValue(Num,Sep,Dot ++ Dec) ++ After.
toReverseString(X) -> lists:reverse(io_lib:format("~.2f",[X])).
splitInThousands([A,B,C|R],Acc) -> splitInThousands(R,[[C,B,A]|Acc]);
splitInThousands([A,B|R],Acc) -> splitInThousands(R,[[B,A]|Acc]);
splitInThousands([A|R],Acc) -> splitInThousands(R,[[A]|Acc]);
splitInThousands([],Acc) -> Acc.
splitInReversePackets([A,B,$.|R]) -> lists:reverse(splitInThousands(R,[[B,A]])).
% return a tuple made of {string to print at the beginning,
% string to use to separate the integer part and the decimal,
% string to use for thousand separator,
% string to print at the end}
currencyConvention(us) -> {"",".",",","$"};
currencyConvention(de) -> {"",",","."," Euro"}; % not found how to print the symbol €
currencyConvention(fr) -> {"Euro ",","," ",""};
currencyConvention(_) -> {"",".",",",""}. % default for unknown country symbol
printValue([A|R=[_|_]],Sep,Acc) -> printValue(R,Sep, Sep ++ A ++ Acc);
printValue([A],_,Acc) -> A ++ Acc.
test in the shell:
1> c(pcur).
{ok,pcur}
2> pcur:printCurrency(123456.256,fr).
"Euro 123 456,26"
3> pcur:printCurrency(123456.256,de).
"123.456,26 Euro"
4> pcur:printCurrency(123456.256,us).
"123,456.26$"
5>
Edit Reading the other proposal and your comments, this solution is clearly not the direction you are going to. Nevertheless, it is in my opinion a valuable way to solve your problem. It should be fast and more important for me, straight and easy to read and update (add currency, split in different size, ??)

I suppose you can use re:replace/4 http://erlang.org/doc/man/re.html#replace-4
Eg for DE:
1> re:replace("9999.99", "\\.", ",",[global, {return,list}]).
"9999,99"
But, for adding comma to last two chars as in French, I suppose you can try do something like
1> N = re:replace("9.999.999.99", "\\.", " ",[global, {return,list}]).
"9 999 999 99"
2> Nr = lists:reverse(N).
"99 999 999 9"
3> Nrr = re:replace("99 999 999 9", "\\ ", ",",[{return,list}]).
"99,999 999 9"
4> lists:reverse(Nrr).
"9 999 999,99"
Or you can try create more better regex.
P.S. For convert integer/float to list, you can try use io_lib:format/2, eg
1> [Num] = io_lib:format("~p", [9999.99]).
["9999.99"]
2> Num.
"9999.99"

Related

Using Alex in Haskell to make a lexer that parses Dice Rolls

I'm making a parser for a DSL in Haskell using Alex + Happy.
My DSL uses dice rolls as part of the possible expressions.
Sometimes I have an expression that I want to parse that looks like:
[some code...] 3D6 [... rest of the code]
Which should translate roughly to:
TokenInt {... value = 3}, TokenD, TokenInt {... value = 6}
My DSL also uses variables (basically, Strings), so I have a special token that handle variable names.
So, with this tokens:
"D" { \pos str -> TokenD pos }
$alpha [$alpha $digit \_ \']* { \pos str -> TokenName pos str}
$digit+ { \pos str -> TokenInt pos (read str) }
The result I'm getting when using my parse now is:
TokenInt {... value = 3}, TokenName { ... , name = "D6"}
Which means that my lexer "reads" an Integer and a Variable named "D6".
I have tried many things, for example, i changed the token D to:
$digit "D" $digit { \pos str -> TokenD pos }
But that just consumes the digits :(
Can I parse the dice roll with the numbers?
Or at least parse TokenInt-TokenD-TokenInt?
PS: I'm using PosN as a wrapper, not sure if relevant.
The way I'd go about it would be to extend the TokenD type to TokenD Int Int so using the basic wrapper for convenience I would do
$digit+ D $digit+ { dice }
...
dice :: String -> Token
dice s = TokenD (read $ head ls) (read $ last ls)
where ls = split 'D' s
split can be found here.
This is an extra step that'd usually be done in during syntactic analysis but doesn't hurt much here.
Also I can't make Alex parse $alpha for TokenD instead of TokenName. If we had Di instead of D that'd be no problem. From Alex's docs:
When the input stream matches more than one rule, the rule which matches the longest prefix of the input stream wins. If there are still several rules which match an equal number of characters, then the rule which appears earliest in the file wins.
But then your code should work. I don't know if this is an issue with Alex.
I decided that I could survive with variables starting with lowercase letters (like Haskell variables), so I changed my lexer to parse variables only if they start with a lowercase letter.
That also solved some possible problems with some other reserved words.
I'm still curious to know if there were other solutions, but the problem in itself was solved.
Thank you all!

How to judge the characters of non-language characters?

My title may be misunderstood, I describe it:
Can be Chinese/Japanese or any other country's language, such as 你好 or こんにちは
Can be English letters, A-Z or a-z
Can't be a symbol, such as ! or !,, or ,
Can't be special characters such as Emoji or other symbols
Can it be judged by the binary byte number of elixir, or by Unicode?
If I understood your question well, you want to check if a given string contains Chinese/Japanese characters, or alphabetical characters. but not punctuation or emoji?
For the Asian characters you can use the CJK range from Unicode, which might close enough. You can always check more ranges for languages you want to (dis)allow.
So the first step would be to check if a given code point is in the CJK range(s):
def is_in_range?(cp) do
ranges = [
{"\u4E00", "\u9FEF"},
{"\u3400", "\u4DBF"},
{"\u20000", "\u2A6DF"},
{"\u2A700", "\u2B73F"},
{"\u2B740", "\u2B81F"},
{"\u2B820", "\u2CEAF"},
{"\u2CEB0", "\u2EBEF"},
{"\u3007", "\u3007"}
]
# Check if the codepoint is any of the ranges above.
ranges
|> Enum.map(fn {s, e} ->
cp >= s and cp <= e
end)
|> Enum.any?()
end
If we have that function, we can check for any given string if it contains any of these characters:
def contains_cjk(str) do
str |> String.codepoints() |> Enum.map(&is_in_range?/1) |> Enum.any?()
end
If you want to compare alpha characters you can use a regular regex, or just add the range from A-Z, and a-z (\u0061 to \u007A, and \u0041 to \u005A). For example, your second string (こんにちは) its first code point is in the "Hiragana" code block. You could add the range (\u3040 to \u309F) to also allow these characters. A listing of blocks can be found here.
A note on performance is in place here. This code is not linear, as for n characters it will do #amount_of_chars_in_range comparisons.

Can I match against a string that contains non-ASCII characters?

I am writing an program in which I am dealing with strings in the form, e.g., of "\001SOURCE\001". That is, the strings contained alphanumeric text with an ASCII character of value 1 at each end. I am trying to write a function to match strings like these. I have tried a match like this:
handle(<<1,"SOURCE",1>>) -> ok.
But the match does not succeed. I have tried a few variations on this theme, but all have failed.
Is there a way to match a string that contains mostly alphanumeric text, with the exception of a non-alpha character at each end?
You can also do the following
[1] ++ "SOURCE" ++ [1] == "\001SOURCE\001".
Or convert to binary using list_to_binary and pattern match as
<<1,"SOURCE",1>> == <<"\001SOURCE\001">>.
Strings are syntactic sugar for lists. Lists are a type and binaries are a different type, so your match isn't working out because you're trying to match a list against a binary (same problem if you tried to match {1, "STRING", 1} to it, tuples aren't lists).
Remembering that strings are lists, we have a few options:
handle([1,83,84,82,73,78,71,1]) -> ok.
This will work just fine. Another, more readable (but uglier, sort of) way is to use character literals:
handle([1, $S,$T,$R,$I,$N,$G, 1]) -> ok.
Yet another way would be to strip the non-character values, and then pass that on to a handler:
handle(String) -> dispatch(string:strip(String, both, 1)).
dispatch("STRING") -> do_stuff();
dispatch("OTHER") -> do_other_stuff().
And, if at all possible, the best case is if you just stop using strings for text values entirely (if that's feasible) and process binaries directly instead. The syntax of binaries is much friendlier, they take up way fewer resources, and quite a few binary operations are significantly more efficient than their string/list counterparts. But that doesn't fit every case! (But its awesome when dealing with sockets...)

Can i set particular byte in string?

I have very long string returned from os:cmd. My exe-file output contains some symbols with code 4, so i replaced them with other symbol and put meta in the beginning of the output. Now i want to replace symbols back. How i can do it in quickest way?
I'm an Erlang noob, so this answer is most likely not the best answer. There's probably a function that does this in a chapter I haven't reached yet in the Erlang Programming book. However, I think this does what you want:
-module(replace).
-export([replace/3]).
replace([], _, _) -> [];
replace([OldChar | T], OldChar, NewChar) -> [NewChar | replace(T, OldChar, NewChar)];
replace([H | T], OldChar, NewChar) -> [H | replace(T, OldChar, NewChar)].
It just goes through list (your string) and replaces the old character with the new one. It doesn't handle I18N. There are probably faster ways to do this. It will let you do this:
24> replace:replace([48,49,50,51,52,53,54,55,56,57], 53, 45).
"01234-6789"
or this:
28> replace:replace("39582049867", 57, 45).
"3-58204-867"
In terms of the quickest way - I'm going to guess that would be a provided function. If not, you'll have to code it up different ways and run the numbers.
Erlang strings are lists. Erlang lists are immutable. So you can't change particular bytes within a string, you can only generate another string with these bytes replaced.
Either replace the characters again (using map), or pass the original string around.

String splitting problems in Erlang

I've been playing around with the splitting of atoms and have a problem with strings. The input data will always be an atom that consists of some letters and then some numbers, for instance ms444, r64 or min1. Since the function lists:splitwith/2 takes a list the atom is first converted into a list:
24> lists:splitwith(fun (C) -> is_atom(C) end, [m,s,4,4,4]).
{[m,s],[4,4,4]}
25> lists:splitwith(fun (C) -> is_atom(C) end, atom_to_list(ms444)).
{[],"ms444"}
26> atom_to_list(ms444).
"ms444"
I want to separate the letters from the numbers and I've succeeded in doing that when using a list, but since I start out with an atom I get a "string" as result to put into my splitwith function...
Is it interpreting each item in the list as a string or what is going on?
You might want to have a look at the string module documentation:
http://www.erlang.org/doc/man/string.html
The following function might interest you:
tokens(String, SeparatorList) -> Tokens
Since strings in Erlang are just a list() of integer() the test in the fun will be made if the item is an atom() when it is in fact an integer(). If the test is changed to look for letters it works:
29> lists:splitwith(fun (C) -> (C >= $a) and (C =< $Z) end, atom_to_list(ms444)).
{"ms","444"}
An atom in erlang is a named constant and not a variable (or not like a variable is in an imperative language).
You should really not create atoms in dynamic fashion (that is, don't convert things to atoms at runtime)
They are used more in pattern matching and send recive code.
Pid ! {matchthis, X}
recive
{foobar,Y} -> doY(Y);
{matchthis,X} -> doX(X);
Other -> doother(Other)
end
A variable, like X could be set to an atom. For example X=if 1==1 -> ok; true -> fail end. I could suffer from poor imagination but I can't think of a way why you would like to parse atom. You should be in charge of what atoms you write and not use list_to_atom(CharIntegerList).
Can you perhaps give a more overview of what you like to accomplish?
A "string" in Erlang is not a primitive type: it is just a list() of integers(). So if you want to "separate" the letters from the digits, you'll have to do comparison with the integer representation of the characters.

Resources