How to define money amounts in an API - currency

I'm going to create an API which contains money amounts. I was wondering what the best practices are, or whether someone has some good or bad experiences with certain formats.
should we transmit base units or minor units? (amount vs amount_cents)
should we represent the numbers as integers / decimals or as strings?
I've seen the following two possibilities:
send amounts as a string like so: "5.85" (a string with base units)
send amounts in their minor unit: 585 (an integer which expresses the amount in the minor unit)
I'm going back and forth between those two. So I went out to check what other APIs use and came up with the following list:
Stripe: integer with minor units
Braintree: string with base units
Google Wallet: string with base units
Paypal: string with base units
Amazon Payments: string with base units
The Currency Cloud: string with base units
2checkout: string with base units
Adyen: integer with minor units
Dwolla: decimal with base units
GotoBilling: weird heuristics! "Amount may be formatted with or without a decimal. If no decimal is given two (2) decimal places are assumed (1.00 = 100)"
GoCardless: string with base units
Intuit: decimal with base units in requests, string with base units in responses
Klarna: integer with minor units
MasterCard: integer with minor units
Paynova: string with base units
Rogers Catalyst: string with base units
WePay: string with base units
Venmo: decimal with base units
So, out of 18 sampled APIs, 4 are using minor units, 13 are using base units and 1 is using a hard-to-comprehend mixture. And within the 13 who use base units, 10 are transmitting them as quoted strings, 3 as unquoted decimals (actually 2 and a half if you look at Intuit).
I personally feel uncomfortable having to parse a string like "8.20", because if you parse this it becomes "8.19999999..." if you make the mistake to use floats. So I'm leaning towards sending integers only. But I don't think this is a great argument, and I see that generally APIs tend to go with base units as strings.
Do you have any good arguments for/against each format?

Integers will eat the dot, that's one less byte :D Integers will have a max_int, do you have anyone rich enough that may overflow?
People that will parse a currency string as float will turn the int to float anyway.
If you send binary data, integer will be much smaller than a string and the way to go. If you send xml anyway, you might as well define it a string (the file is probably compressed before sending right?), try to make it "currency" type as opposed to listing it as a full string though.

Which datatype is the best depends on your usage. For calculations integers or doubles are going to be faster, skipping the parsing step.
If sending the data through networks is your goal you're better off with strings.
That said, any functionality should be realizable using either method.

Related

Dart: best type to store currency

In Java currency is usually stored in BigDecimal. But what to use to store currency values in Dart? BigInt doesn't seem to be a solution because it is for integer values only.
Definitely do not use doubles for an accounting application. Binary floating point numbers inherently cannot represent exact decimal values, which can lead to inaccurate calculations for seemingly trivial operations. Even though the errors will be small, they can eventually accumulate into larger errors. Setting decimal precision on binary numbers doesn't really make sense.
For currency, you instead either should something intended to store decimal values (e.g. package:decimal) or should use fixed-point arithmetic to store cents (or whatever the smallest amount of currency is that you want to track). For example, instead of using doubles to store values such as $1.23, use ints to store amounts in the smallest unit of currency (e.g. 123 cents). Then you can use helper classes to format the amounts wherever they're displayed. For example:
class Money {
int cents;
Money({required this.cents});
#override
String toString() => (cents / 100).toStringAsFixed(2);
Money operator +(Money other) => Money(cents: cents + other.cents);
// Add other operations as desired.
}

Lua bit library

Right now I have made my own funcs to do bitwise and + not but then I saw the bit library and tried to use it but it doesn't work how I imagined, it returns a large decimal instead of the binary bit and so my question is actually a few.
First: how to do bitwise AND on binary number using the bit32 library.
10110111
11000100 = 10000100
Second: How to calculate the ipv4 broadcast address by adding the network address and the wildcard mask in binary form using the bit32 library
192.168.1.0 + 31 = 192.168.1.31
11000000.10100000.00000001.00000000
00000000.00000000.00000000.00011111 = 11000000.10100000.00000001.00011111
I am assuming that your bitwise and / not functions take string arguments.
Numbers can be represented in multiple ways.
The number 110101, which is in base two, has the same value as 53, which is in base 10.
When you say
x=123
Lua converts 123 into its binary representation, 1111011, which it then stores in memory as bits.
When you say
print(x)
Lua goes into memory, grabs x, which is 1111011, and then converts it into its more human-readable base 10 representation, and you see
123
The bitwise functions you wrote performs bit operations on strings which display the binary representation of a number like "1111011". the bit32 library performs bit operations on numbers, which display the decimal representation of a number like 123.
In Lua, "1001001" is a string, but if arithmatic operations are performed on it, it treats it as if it were a number written in base 10. So when you do
bit32.band("101","110")
the bit32.band function interprets its arguments as one-hundred-one and one-hundred-ten.
You must first convert your binary strings into numbers:
bit32.band(tonumber("101",2), tonumber("110",2))

Why is it best to store a telephone number as a string vs. integer?

As the question states, why is it considered best practice to store telephone numbers as strings rather than integers in the telephone_number column?
Not sure I understand the rationale for this. Please help clear this up!
Thanks!
Telephone numbers are strings of digit characters, they are not integers.
Consider for example:
Expressing a telephone number in a different base would render it meaningless
Adding or multiplying two telephone numbers together, or any math operation on a phone number, is meaningless. The result is not another telephone number (except by conicidence)
Telephone numbers are intended to be entered "as-is" into a connected device.
Telephone numbers may have leading zeroes.
Manipulations of telephone numbers, such as adding an area code, are String operations.
Storing the string version of the telephone number makes this clear and unambiguous.
History: On old pulse-encoded dial systems, the code for each digit in a telephone number was sent as the same number of pulses as the digit (or 10 pulses for "0"). That may be why we still use digits to represent the parts of a phone number. See http://en.wikipedia.org/wiki/Pulse_dialing
What Neil Slater said is correct. I would add that there are lots of edge cases where you can't express a telephone number as a number value consistently.
For example, consider these numbers:
011-123-555-1212
+11-123-555-1212
+1 (112) 355-5121 x2
These are all potentially valid phone numbers, but they mean very different things. Yet, in integer form, they are all 111235551212.
If you are going to store the number for display from input, then you must use a string.
However, while it is true that no mathematical operations can be performed on a number that have meaning. Using a number in hashsets and for indexing is quicker than using a string. So provided you can guarantee or homogenise your set of numbers, so they are all consistent, then you may see better performance operating on a number.
For example, in the Telco world, rating calls for a given customer includes a lot of searching on their CLI and in this situation it is faster and cheaper to search by integer. Generally though strings will be fine performance wise, it is only where performance matters and you have multiple searches to perform for a huge range of numbers - i.e. Rating 250 million calls across 2 million lines and 2000 tariffs. In memory rating also gets expensive, so being able to use a 64bit int or uint is cheaper when dealing with these volumes.
Consider these phone numbers for example
099-1234-56789 or +91-8907-687665.
In this case,if the phone_number attribute is of type integer,then it can't accept these values.It should be a string to hold these type of values.So string is always preferred than integer
There is several reasons for this :
Phone numbers often start with a "0" : an integer will remove all leading "0"s
Phone number can have special char : +, (, -, etc. (for exemple : +33 (0)6 12 23 34)
You cannot perform operations on phones : adding phones, for instance, would be meaningless
Phone number may be internationalised, i.e. different format for different people, thus not possible with integers
There might be other reasons, but I guess that's already a fair amount of those :)

In Delphi how do I determine when to use Real, Real48, Double or Single data types?

Most of my applications revolve around financial calculations involving payments and interest rate calculations. I'm looking to find out how to determine what Delphi data type is best to use.
If I'm using a database to store these values and I've defined the fields in that database to be a decimal value with two decimal places, which Delphi datatype is most compatible with that scenario?
Should I use a rounding formula in Delphi to format the results to two decimal places before storing the values in the database? If so what is a best practice for doing so?
For such calculations, don't use floating point types like Real, Single or Double. They are not good with decimal values like 0.01 or 1234.995, as they must approximate them.
You can use Currency, a fixed point type, but that is still limited to 4 decimal places.
Try my Decimal type, which has 28-29 places and has a decimal exponent so it is ideal for such calculations. The only disadvantage is that it is not FPU supported (but written in assembler, nevertheless) so it is not as fast as the built-in types. It is the same as the Decimal type used in .NET (but a little faster) and quite similar to the one used on the Mac.
If you want to do financial calculations, don't use any of the floating-point/real types. Delphi has a Currency type, which is a fixed-point value with 4 decimal places, that should be just what you need.

Is The Effectiveness Of Huffman Coding Limited?

My problem is that I have a 100,000+ different elements and as I understand it Huffman works by assigning the most common element a 0 code, and the next 10, the next 110, 1110, 11110 and so on. My question is, if the code for the nth element is n-bits long then surely once I have passed the 32nd term it is more space efficient to just sent 32-bit data types as they are, such as ints for example? Have I missed something in the methodology?
Many thanks for any help you can offer. My current implementation works by doing
code = (code << 1) + 2;
to generate each new code (which seems to be correct!), but the only way I could encode over 100,000 elements would be to have an int[] in a makeshift new data type, where to access the value we would read from the int array as one continuous long symbol... that's not as space efficient as just transporting a 32-bit int? Or is it more a case of Huffmans use being with its prefix codes, and being able to determine each unique value in a continuous bit stream unambiguously?
Thanks
Your understanding is a bit off - take a look at http://en.wikipedia.org/wiki/Huffman_coding. And you have to pack the encoded bits into machine words in order to get compression - Huffman encoded data can best be thought of as a bit-stream.
You seem to understand the principle of prefix codes.
Could you tell us a little more about these 100,000+ different elements you mention?
The fastest prefix codes -- universal codes -- do, in fact, involve a series of bit sequences that can be pre-generated without regard to the actual symbol frequencies. Compression programs that use these codes, as you mentioned, associate the most-frequent input symbol to the shortest bit sequence, the next-most-frequent input symbol to the next-shorted bit sequence, and so on.
What you describe is one particular kind of prefix code: unary coding.
Another popular variant of the unary coding system assigns elements in order of frequency to the fixed codes
"1", "01", "001", "0001", "00001", "000001", etc.
Some compression programs use another popular prefix code: Elias gamma coding.
The Elias gamma coding assigns elements in order of frequency to the fixed set of codewords
1
010
011
00100
00101
00110
00111
0001000
0001001
0001010
0001011
0001100
0001101
0001110
0001111
000010000
000010001
000010010
...
The 32nd Elias gamma codeword is about 10 bits long, about half as long as the 32nd unary codeword.
The 100,000th Elias gamma codeword will be around 32 bits long.
If you look carefully, you can see that each Elias gamma codeword can be split into 2 parts -- the first part is more or less the unary code you are familiar with. That unary code tells the decoder how many more bits follow afterward in the rest of that particular Elias gamma codeword.
There are many other kinds of prefix codes.
Many people (confusingly) refer to all prefix codes as "Huffman codes".
When compressing some particular data file, some prefix codes do better at compression than others.
How do you decide which one to use?
Which prefix code is the best for some particular data file?
The Huffman algorithm -- if you neglect the overhead of the Huffman frequency table -- chooses exactly the best prefix code for each data file.
There is no singular "the" Huffman code that can be pre-generated without regard to the actual symbol frequencies.
The prefix code choosen by the Huffman algorithm is usually different for different files.
The Huffman algorithm doesn't compress very well when we really do have 100,000+ unique elements --
the overhead of the Huffman frequency table becomes so large that we often can find some other "suboptimal" prefix code that actually gives better net compression.
Or perhaps some entirely different data compression algorithm might work even better in your application.
The "Huffword" implementation seems to work with around 32,000 or so unique elements,
but the overwhelming majority of Huffman code implementations I've seen work with around 257 unique elements (the 256 possible byte values, and the end-of-text indicator).
You might consider somehow storing your data on a disk in some raw "uncompressed" format.
(With 100,000+ unique elements, you will inevitably end up storing many of those elements in 3 or more bytes).
Those 257-value implementations of Huffman compression will be able to compress that file;
they re-interpret the bytes of that file as 256 different symbols.
My question is, if the code for the nth element is n-bits long then
surely once I have passed the 32nd term it is more space efficient to
just sent 32-bit data types as they are, such as ints for example?
Have I missed something in the methodology?
One of the more counter-intuitive features of prefix codes is that some symbols (the rare symbols) are "compressed" into much longer bit sequences. If you actually have 2^8 unique symbols (all possible 8 bit numbers), it is not possible to gain any compression if you force the compressor to use prefix codes limited to 8 bits or less. By allowing the compressor to expand rare values -- to use more than 8 bits to store a rare symbol that we know can be stored in 8 bits -- that frees up the compressor to use less than 8 bits to store the more-frequent symbols.
related:
Maximum number of different numbers, Huffman Compression

Resources