I am trying to build a Sieve of Eratosthenes in Lua and i tried several things but i see myself confronted with the following problem:
The tables of Lua are to small for this scenario. If I just want to create a table with all numbers (see example below), the table is too "small" even with only 1/8 (...) of the number (the number is pretty big I admit)...
max = 600851475143
numbers = {}
for i=1, max do
table.insert(numbers, i)
end
If I execute this script on my Windows machine there is an error message saying: C:\Program Files (x86)\Lua\5.1\lua.exe: not enough memory. With Lua 5.3 running on my Linux machine I tried that too, error was just killed. So it is pretty obvious that lua can´t handle the amount of entries.
I don´t really know whether it is just impossible to store that amount of entries in a lua table or there is a simple solution for this (tried it by using a long string aswell...)? And what exactly is the largest amount of entries in a Lua table?
Update: And would it be possible to manually allocate somehow more memory for the table?
Update 2 (Solution for second question): The second question is an easy one, I just tested it by running every number until the program breaks: 33.554.432 (2^25) entries fit in one one-dimensional table on my 12 GB RAM system. Why 2^25? Because 64 Bit per number * 2^25 = 2147483648 Bits which are exactly 2 GB. This seems to be the standard memory allocation size for the Lua for Windows 32 Bit compiler.
P.S. You may have noticed that this number is from the Euler Project Problem 3. Yes I am trying to accomplish that. Please don´t give specific hints (..). Thank you :)
The Sieve of Eratosthenes only requires one bit per number, representing whether the number has been marked non-prime or not.
One way to reduce memory usage would be to use bitwise math to represent multiple bits in each table entry. Current Lua implementations have intrinsic support for bitwise-or, -and etc. Depending on the underlying implementation, you should be able to represent 32 or 64 bits (number flags) per table entry.
Another option would be to use one or more very long strings instead of a table. You only need a linear array, which is really what a string is. Just have a long string with "t" or "f", or "0" or "1", at every position.
Caveat: String manipulation in Lua always involves duplication, which rapidly turns into n² or worse complexity in terms of performance. You wouldn't want one continuous string for the whole massive sequence, but you could probably break it up into blocks of a thousand, or of some power of 2. That would reduce your memory usage to 1 byte per number while minimizing the overhead.
Edit: After noticing a point made elsewhere, I realized your maximum number is so large that, even with a bit per number, your memory requirements would optimally be about 73 gigabytes, which is extremely impractical. I would recommend following the advice Piglet gave in their answer, to look at Jon Sorenson's version of the sieve, which works on segments of the space instead of the whole thing.
I'll leave my suggestion, as it still might be useful for Sorenson's sieve, but yeah, you have a bigger problem than you realize.
Lua uses double precision floats to represent numbers. That's 64bits per number.
600851475143 numbers result in almost 4.5 Terabytes of memory.
So it's not Lua's or its tables' fault. The error message even says
not enough memory
You just don't have enough RAM to allocate that much.
If you would have read the linked Wikipedia article carefully you would have found the following section:
As Sorenson notes, the problem with the sieve of Eratosthenes is not
the number of operations it performs but rather its memory
requirements.[8] For large n, the range of primes may not fit in
memory; worse, even for moderate n, its cache use is highly
suboptimal. The algorithm walks through the entire array A, exhibiting
almost no locality of reference.
A solution to these problems is offered by segmented sieves, where
only portions of the range are sieved at a time.[9] These have been
known since the 1970s, and work as follows
...
I have 37 similar SMT2 problems, each in two equisatisfiable versions that I call compact and unrolled. The problems are using incremental SMT solving and every (check-sat) returns unsat. The compact versions are using the QF_AUFBV logic, the unrolled versions use QF_ABV. I did run them in z3, yices, and boolector (but boolector only supports the unrolled version). The results of this performance evaluation can be found here:
http://scratch.clifford.at/compact_smt2_enc_r1102.html
The SMT2 files for this examples can be downloaded from here (~10 MB):
http://scratch.clifford.at/compact_smt2_enc_r1102.zip
I run each solver 5 times with different values for the :random-seed option. (Except boolector which does not support :random-seed. So I simply run boolector 5 times on the same input.) The variation I get when running the solvers with different :random-seed is relatively small (see +/- values in the table for the max outlier).
There is a wide spread between solvers. Boolector and Yices are consistently faster than z3, in some cases up to two orders of magnitude.
However, my question is about "z3 vs z3" performance. Consider for example the following data points:
| Test Case | Z3 Median Runtime | Max Outlier |
|-------------------|-------------------|-------------|
| insn_add unrolled | 873.35 seconds | +/- 0% |
| insn_add compact | 1837.59 seconds | +/- 1% |
| insn_sub unrolled | 4395.67 seconds | +/- 16% |
| insn_sub compact | 2199.21 seconds | +/- 5% |
The problems insn_add and insn_sub are almost identical. Both are generated from Verilog using Yosys and the only difference is that insn_add is using this Verilog module and insn_sub is using this one in its place. Here is the diff between those two source files:
--- insn_add.v 2017-01-31 15:20:47.395354732 +0100
+++ insn_sub.v 2017-01-31 15:20:47.395354732 +0100
## -1,6 +1,6 ##
// DO NOT EDIT -- auto-generated from generate.py
-module rvfi_insn_add (
+module rvfi_insn_sub (
input rvfi_valid,
input [ 32 - 1 : 0] rvfi_insn,
input [`RISCV_FORMAL_XLEN - 1 : 0] rvfi_pc_rdata,
## -29,9 +29,9 ##
wire [4:0] insn_rd = rvfi_insn[11: 7];
wire [6:0] insn_opcode = rvfi_insn[ 6: 0];
- // ADD instruction
- wire [`RISCV_FORMAL_XLEN-1:0] result = rvfi_rs1_rdata + rvfi_rs2_rdata;
- assign spec_valid = rvfi_valid && insn_funct7 == 7'b 0000000 && insn_funct3 == 3'b 000 && insn_opcode == 7'b 0110011;
+ // SUB instruction
+ wire [`RISCV_FORMAL_XLEN-1:0] result = rvfi_rs1_rdata - rvfi_rs2_rdata;
+ assign spec_valid = rvfi_valid && insn_funct7 == 7'b 0100000 && insn_funct3 == 3'b 000 && insn_opcode == 7'b 0110011;
assign spec_rs1_addr = insn_rs1;
assign spec_rs2_addr = insn_rs2;
assign spec_rd_addr = insn_rd;
But their behavior in this benchmark is very different: Overall the performance for insn_sub is much worse than the performance for insn_add. Furthermore, in the case of insn_add the unrolled version runs about twice as fast as the compact version, but in the case of insn_sub the compact version runs about twice as fast as the unrolled version.
Here are the times before creating the median. The :random-seed setting obviously does not seem to make much of a difference:
insn_add unrolled: 868.15 873.34 873.35 873.36 874.88
insn_add compact: 1828.70 1829.32 1837.59 1843.74 1867.13
insn_sub unrolled: 3204.06 4195.10 4395.67 4539.30 4596.05
insn_sub compact: 2003.26 2187.52 2199.21 2206.04 2209.87
Since the value of :random-seed does not seem to have much of an effect, I would assume there is something intrinsic to those .smt2 files that makes them fast or slow on z3. How would I investigate this? How would I find out what makes the fast cases fast and the slow cases slow, so I can avoid whatever makes the slow cases slow? (Yes, I know that this is a very broad question. Sorry. :)
<edit>
Here are some more concrete questions along the lines of my primary question. This questions are directly inspired by the obvious differences I can see between the (compact) insn_add and insn_sub benchmarks.
Can the order of (declare-..) and (define-..) statements in my SMT input influence performance?
Can changing the names of declared or defined function influence performance?
If I split a BV into smaller BVs, and then concatenate them back again, can this influence performance?
If I either compare two BVs for equality, or split the BV into single bit variables and compare each of the bits individually, can this influence performance?
Also: What operations in z3 do actually change when I chose a different value for :random-seed?
</edit>
Making small changes to the .smt2 files without changing the semantics can be very difficult for large test cases generated by complex tools. I'm hoping there are other things I can try first, or maybe there is some existing expert knowledge about the kind of changes that might be worth investigating. Or alternatively: What kind of changes would effectively be equivalent to changing :random-seed and thus are not worth investigating.
(Tests performed with git rev c67cf16, i.e. current git head of z3 on an AWS EC2 c4.8xlarge instance with make -j40. The runtimes are CPU seconds, not wall-clock seconds.)
Edit 2:
I have now three test cases (test1.smt2, test2.smt2, and test3.smt2) that are identical except that I've renamed some of the functions I declare/define. The test cases can be found at http://svn.clifford.at/handicraft/2017/z3perf/.
This is a variation of the original problem that takes ~2 minutes to solve instead of ~1 hour. As before, changing the value of :random-seed only has a marginal effect. But renaming some of the functions without changing anything else changes the runtime by more than 2x:
I've now opened an issue on github, arguing that :random-seed should by tied into whatever thing I change randomly inside z3 when I rename the functions in my SMT2 code.
As you say, there can be many things that may be creating that perf difference in add vs sub.
A good start is to check if the formulas after preprocessing are equal modulo add/sub (btw, Z3 converts 'a - b' into 'a + (-1) * b'). If not, then trace down which preprocessing step is at fault. Then trace down the problem and send us a patch :)
Alternatively, the problem could be down the line, e.g., in the bitblaster. You can also dump the bit-blasted formulas of both of your files and check if there is a significant difference in terms of number of variables and/or clauses.
Anyway, you'll need to be prepared to invest a day or two (maybe more) to track down these issues. If you find something, let us know and/or send us a patch! :)
I wonder why there is different in the result for those two cases.
Working Storage:
WS-SUM-LEN PIC S9(4) COMP.
WS-LEN-9000 PIC 9(5) VALUE 9000.
WS-TMP-LEN PIC 9(5).
WS-FIELD-A PIC X(2000).
Case 1) COMPUTE WS-SUM-LEN = WS-LEN-9000 + LENGTH OF WS-FIELD-A
Result: WS-SUM-LEN = 1000
Case 2)
MOVE LENGTH OF WS-FIELD-A TO WS-TMP-LEN
COMPUTE WS-SUM-LEN = WS-LEN-9000 + WS-TMP-LEN
Result: WS-SUM-LEN = 11000
The compiler option is TRUNC(OPT). Why there is no trunc occur for case 2?
Binary fields in IBM's Enterprise COBOL
Warnings
Compiler option TRUNC determines how code for binary fields is generated.
Do not just up and change from your site's default setting of option TRUNC. The different settings for TRUNC can give different results.
Changing from TRUNC(BIN) to TRUNC(STD) will give different results for any values beyond the decimal-values represented by the PICture defining the field. For a signed field, the same applies for the negative value.
01 a-good-name BINARY PIC 99.
ADD 1 TO a-good-name
With TRUNC(STD) the result will be truncated once 99 is reached. With TRUNC(BIN) the result will be truncated once 65535 is reached (if the field were signed, the truncation would be at 99 as before for TRUNC(STD) and 32767 for TRUNC(BIN)).
Changing from TRUNC(BIN) to TRUNC(OPT), without program changes, is only possible if all, entirely all, usage of binary fields is limited to decimal-values represented by the PICture. Particular pieces of code may appear to "work", but it would be a massive coincidence if all use of binary fields gave the same result between the two compiler options, on your system.
It is similar for changing from TRUNC(STD) to TRUNC(OPT). Although the amount of coincidence for working would be smaller, this would increase a false sense of security leaving the potential for subtle differences to be missed for some time.
Changing from genuine use of TRUNC(OPT) to either TRUNC(STD) or TRUNC(BIN) is possible without effort. However, why would you want to?
However, if your use is not genuine (using TRUNC(OPT) with data that does not conform to PICture), then your original results are unreliable, and you will get differences if changing to TRUNC(STD) and likely get difference changing to TRUNC(BIN).
In short, changing the site-default for compiler option TRUNC is something to be considered very carefully, and must include provision for verification of results.
Sites do at times make such a change, the only ones I know of are TRUNC(BIN) (mostly) and TRUNC(STD) to TRUNC(OPT), for performance reasons. These have been done as projects, not just by changing the option and blundering on from there.
Do not override the site-default for TRUNC within systems. If you have programs which are using the same binary data (from files, databases, inter-program communication, messages, or any other way) and they don't all treat the data in the same way, it is asking for trouble.
Some myths
Further explanation will be given later in the text.
There is a difference between TRUNC(BIN) and making all your binary
fields COMP-5 (or COMPUTATIONAL-5).
There is no difference whatsoever. When TRUNC(BIN) is specified, the compiler simply treats all binary fields as COMP-5.
Native-binary is faster than COBOL binary (a binary field with decimal limits defined by the PICture clause).
Although the term itself makes many experienced people think it will be faster ("it'll be like when I code it myself in Assembler") it is in fact slower, on the whole. The slowing-down increases as the field-size increases.
Native-binary (COMP-5/COMPUTATIONAL-5) does not truncate.
It does. It truncates to field-size. Because it truncates to field-size, the intermediate fields must always be larger than the source fields, which means more instructions must be used, and different instructions.
Also, and important to know, the ON SIZE ERROR clause (which can be used with all arithmetic verbs) always only uses the PICture clause to determine that a size-error has occurred. That is, if you have a COMP-5 PIC S9(4), which can contain a maximum positive value of 32,767 and do this:
MULTIPLY that-field BY 10 GIVING that-field
ON SIZE ERROR
DISPLAY "Busted"
END-MULTIPLY
Any value above 9999 will cause the DISPLAY to be processed.
Which really means "don't use ON SIZE ERROR with COMP-5 or TRUNC(BIN)".
TRUNC(OPT) generates optimal code.
In isolation, it does. However this does not preclude further optimisations from compiler option OPTIMIZE/OPT across a wider context.
When using binary fields, always use the maximum PICture for the size of the field
A binary field with 1-4 digits will occupy a half-word, two bytes of storage. With 5-9 digits, a word, or a fullword, of four bytes. With 10-18 digits, a double-word of eight bytes.
The aged recommendation is to always specify four digits, nine digits and 18 digits (well, no-one really goes above nine, do they...?).
This is advice I've received in the past, and given out myself. However, in Enterprise COBOL it is not good advice.
The best advice here is to define the number of digits needed. This will at times improve performance, will never degrade performance, and will make the program easier to understand by best describing the data.
When using binary fields, always make them signed.
More advice I've received and given in the past. Untrue with Enterprise COBOL. If a field can contain a negative value, make it signed. Otherwise make it unsigned.
At times, with interfaces, it is not explicit whether a field should be signed. However, it will be explicit from the maximum value expected. As will the field definition (the USAGE).
For instance, an SQL VARCHAR as a host-variable can have a maximum size of 32767 bytes. Since the actual length is held in a two-byte binary field, the field should be signed. Any value "above" 32767 will be misinterpreted by DB2/SQL.
Since nine decimal digits can fully fit within a word/fullword, there is no problem using nine decimal digits for a COMP/COMP-4/BINARY definition (without TRUNC(BIN)).
Since the compiler has to take care of decimal truncation, and since anything which could lead to truncation would require the "next size up", a binary field of nine digits can require a double-word intermediate field. So requires code to convert to a double-word, and convert the result back from a double-word to a word. If nine digits are required, it will generally be better to define 10 digits and save on the conversions.
Note
The above is all known to hold true for Enterprise COBOL up to V4.2.
IBM has entirely rewritten the code-generation and optimisation (now at two possible levels of optimisation) For Enterprise COBOL V5. There is considerable improvement in the treatment of binary fields, including, for instance, only doing the truncation of values once it is known that truncation is necessary. I am not aware that the use of V5 changes anything here other than the scale of performance differences. All general usage of binary fields should be faster with V5 than with earlier versions of Enterprise COBOL.
Binary fields
COBOL, for binary fields, uses decimal maxima determined by the PICture size.
Such a field with PIC 9 can contain a maximum value of 9 before truncation. If signed, the range of values is -9 to +9. Values outside that range will be truncated.
For PIC 99, 99, and if signed -99 to 99.
For PIC 999, 999,and if signed -999 to 999.
You get the pictu... idea.
It is down to the compiler-implementation as to how those values are stored.
Indeed, according to the Standard, COBOL only recently (1985) had support for binary fields (USAGE BINARY). Which and how actual "non-display" fields were supported was down to USAGE COMPUTATIONAL, whose specifics were compiler-dependent.
Generally across compilers COMP, COMP-1 and COMP-2 (binary, with decimal maxima, short floating-point and long floating-point) are standard, though not part of the Standard. Beyond COMP-2, what the field definitions mean can vary amongst compilers.
So, first recommendation, suggest that your local site standards use BINARY instead of COMP for new code (and PACKED-DECIMAL instead of COMP-3, for packed-decimal fields). BINARY (and COMP-4) within Enterprise COBOL is simply an alias of COMP, so there is absolutely no problem in doing this.
There is another type of binary field, which is the native-binary field. In Enterprise COBOL this is USAGE COMP-5.
COMP-5 has its field-size determined by the PICture definition, but its maxima are that of the full bit-pattern possible for the field size. A PIC S9(4) COMP-5 can contain -32768 to 32767.
Note at this point that a native-binary field, and this may seem counter-intuitive, generally needs more generated machine-code to support its use. This is because it truncates to field-size, rather than PICture.
Note also that there is one place where this does not happen, which is ON SIZE ERROR, which will be true if the value exceeds the PICture size. Which means, to my mind, don't use ON SIZE ERROR with COMP-5 (or TRUNC(BIN), see soon) fields.
Compiler option TRUNC
The compiler option TRUNC defines how machine-code is generated for binary fields. There are three options:
TRUNC(BIN)
Truncation to field-size.
This treats all the non-native-binary fields in the program (COMP/COMP-4/BINARY) as native-binary (as though they had been defined as COMP-5).
This allows the full range of bit patterns to be used, but has impacts on performance.
TRUNC(STD)
Truncation to PICture size.
Generates machine-code for the COBOL Standard truncation to PICture size. PIC 9(n) can contain no more than n significant digits, they will be truncated when the field is a "target" (field value changes).
TRUNC(OPT)
Truncation of any type only used if it happens to be convenient.
I describe this as being a contract between the coder and the compiler. The coder contracts to never (as in never) allow a value to exceed the PICture size. The compiler contracts to always get it right in such a case.
If the coder breaks the contract the coder is entirely to blame for the ensuing rubbish.
When to use each setting of TRUNC (further recommendation)
BIN Never. Use COMP-5 for individual fields where they require access to all bits (pay attention to SQL and CICS "system" fields, and external data from non-Mainframe sources, and inter-language communication between COBOL and JAVA/C/C++ and anywhere else where the data-maxima for a field are beyond the PICture and it is not possible to make the field bigger (as the actual logical definition of the field size is outside your program).
STD use this unless all, as in all, your data always, as in always, conforms to PICture.
OPT use this only, as in only, if all, as in all, your data always, as in always, conforms to PICture.
If you have COMP PIC 99, for instance, you must not, when using OPT, allow that to have a value of 99, and then add one to it. Or anything similar
The Answer
You used TRUNC(OPT), entering into the contract. You immediately broke the contract. It is your fault.
Warning
If your site is using TRUNC(OPT) and not everyone is fully aware of the implications, you will, as in will, have problems.
Substantiation of the Myths from above
There is a difference between TRUNC(BIN) and making all your binary
fields COMP-5 (or COMPUTATIONAL-5).
Define two fields in small program. They should be defined as COMP/COMP-4/BINARY (or COMPUTATIONAL/COMPUTATIONAL-4 if that is your bent).
In the program, add a literal to each of the fields (do this with two separate statements, to make it easier to follow, unless you are experience with the generated code in a listing).
Compile the program with compiler options LIST,NOOFFSET (this will produce, in the compiler listing, output showing the generated machine-code in a so-called "pseudo-assembler" format) and TRUNC(BIN).
Copy the program. In the copy, change the USAGE of the two fields to COMP-5 (or COMPUTATIONAL-5).
Compile this program, again with LIST,NOOFFSET but this time the value for TRUNC is irrelevant as it does not affect COMP-5 fields.
Compare the output listings. If there is one byte difference, eat someone's hat.
Native-binary is faster than COBOL binary (a binary field with decimal limits defined by the PICture clause).
From this discussion at IBM's COBOL Cafe: https://www.ibm.com/developerworks/community/forums/html/topic?id=ae9ef6bc-6e4e-43f8-a814-e66bea25fb8c&ps=25
Here's a multiply of a PIC 9(3) by a PIC 9(5).
With TRUNC(STD)
000023 MULTIPLY
000258 5820 8008 L 2,8(0,8) PIC9-5
00025C 4C20 8000 MH 2,0(0,8) PIC9-3
000260 5020 8010 ST 2,16(0,8)
With TRUNC(BIN)
000019 MULTIPLY
00023C 4820 8030 LH 2,48(0,8) PICS9-4
000240 5840 8038 L 4,56(0,8) PICS9-8
000244 8E40 0020 SRDA 4,32(0)
000248 5D40 C000 D 4,0(0,12) SYSLIT AT +0
00024C 4E50 D120 CVD 5,288(0,13) TS2=16
000250 F154 D110 D123 MVO 272(6,13),291(5,13) TS2=0
000256 4E40 D120 CVD 4,288(0,13) TS2=16
00025A 9110 D115 TM 277(13),X'10' TS2=5
00025E D204 D115 D123 MVC 277(5,13),291(13) TS2=5
000264 4780 B05C BC 8,92(0,11) GN=10(00026C)
000268 9601 D119 OI 281(13),X'01' TS2=9
00026C GN=10 EQU *
00026C 4E20 D120 CVD 2,288(0,13) TS2=16
000270 FC82 D111 D125 MP 273(9,13),293(3,13) TS2=1
000276 D202 D128 C008 MVC 296(3,13),8(12) TS2=24
00027C D204 D12B D115 MVC 299(5,13),277(13) TS2=27
000282 4F20 D128 CVB 2,296(0,13) TS2=24
000286 F144 D12B D110 MVO 299(5,13),272(5,13) TS2=27
00028C 4F50 D128 CVB 5,296(0,13) TS2=24
000290 5C40 C000 M 4,0(0,12) SYSLIT AT +0
000294 1E52 ALR 5,2
000296 47C0 B08E BC 12,142(0,11) GN=11(00029E)
00029A 5A40 C004 A 4,4(0,12) SYSLIT AT +4
00029E GN=11 EQU *
00029E 1222 LTR 2,2
0002A0 47B0 B098 BC 11,152(0,11) GN=12(0002A8)
0002A4 5B40 C004 S 4,4(0,12) SYSLIT AT +4
0002A8 GN=12 EQU *
0002A8 5050 8040 ST 5,64(0,8)
It doesn't take any knowledge of IBM Assembler to work out which of those two pieces of code is going to run more quickly.
The difference in the line-numbers (19 Vs 23) is just down to the fact that TRUNC(BIN) makes the PICture size irrelevant, so where I had three calculations doing the same thing with different size fields, for TRUNC(BIN) the code for each was the same, because the size of each field is the same, a word/fullword of four bytes.
Native-binary (COMP-5/COMPUTATIONAL-5) does not truncate.
See the code immediately above. It is so massive due to the need to provide truncation. The need to provide decimal truncation is down to the COBOL Standard, it's what must happen in the language.
TRUNC(OPT) generates optimal code.
The code generated will always be the most efficient for that code-sequence. The same code-sequence will always generate the same code, before optimisation.
However, the optimizer is capable of spotting that a particular undisturbed state is available for a source-field earlier in the program, and replace part or all of the TRUNC(OPT) code with code relying on the previously-available value.
When using binary fields, always use the maximum PICture for the size of the field
From the same IBM COBOL Cafe discussion referenced above, with these definitions:
01 PIC9-3 BINARY PIC 999.
01 PIC9-5 BINARY PIC 9(5).
01 THE-RESULT8 BINARY PIC 9(8).
01 PIC9-4 BINARY PIC 9(4).
01 PIC9-8 BINARY PIC 9(8).
01 THE-RESULT BINARY PIC 9(8).
And these calculations:
MULTIPLY PIC9-4 BY PIC9-8
GIVING THE-RESULT
MULTIPLY PIC9-3 BY PIC9-5
GIVING THE-RESULT8
Here's the generated code for TRUNC(STD):
000021 MULTIPLY
000248 4830 8018 LH 3,24(0,8) PIC9-4
00024C 5C20 8020 M 2,32(0,8) PIC9-8
000250 5D20 C000 D 2,0(0,12) SYSLIT AT +0
000254 5020 8028 ST 2,40(0,8) THE-RESULT
000023 MULTIPLY
000258 5820 8008 L 2,8(0,8) PIC9-5
00025C 4C20 8000 MH 2,0(0,8) PIC9-3
000260 5020 8010 ST 2,16(0,8)
The first block of pseudo-assembler is with the number of digits in the PICture being the maximum that give the same field-size. A BINARY PIC 9(3) occupies a half-word, and 9(4) is the largest that can appear in a half-word. A PIC 9(5) occupies a word/fullword, and, given Myth 7, eight digits is used for that (to be fair to this particular Myth).
The second block is with the number of digits which represent the data accurately, and which don't happen to require truncation when a multiplication is carried out.
Using the "full-size" PICtures guarantees that unnecessary truncation will always occur.
The difference in the number of instructions is small, and LH is faster than L, so plus to the full-size on that. But M is much slower than L, and MH is slower than L but faster than M. So plus to the optimal size on that. And the D (a divide, which is slow, slow) is not required at all in the second block (because no truncation is required). So bad-boy to the full-size fields on that.
The code for TRUNC(OPT) is also faster for the optimal-size fields, although the difference between the two is not as great (because TRUNC(OPT) in this code-sequence decides it does not need the truncation to base-10 and would not in a million years consider the truncation to field-size).
When using binary fields, always make them signed.
Again from the same IBM COBOL Cafe discussion, here's same-length signed fields Vs unsigned fields, TRUNC(STD):
000019 MULTIPLY
000238 4830 8030 LH 3,48(0,8) PICS9-4
00023C 5C20 8038 M 2,56(0,8) PICS9-8
000240 5D20 C000 D 2,0(0,12) SYSLIT AT +0
000244 5020 8040 ST 2,64(0,8) THE-RESULTS
000021 MULTIPLY
000248 4830 8018 LH 3,24(0,8) PIC9-4
00024C 5C20 8020 M 2,32(0,8) PIC9-8
000250 5D20 C000 D 2,0(0,12) SYSLIT AT +0
000254 5020 8028 ST 2,40(0,8)
The code is different from the above when compiled with both TRUNC(OPT) and TRUNC(BIN), but each of the code-sequences in both cases is identical in those options.
The presence or absence of a sign makes no difference to the code generated.
Except in one case. Where Myth 7 comes into play. With a nine-digit binary, using a signed vs unsigned field does generate less code, but that code generated is more code than if using eight digits.
Since nine decimal digits can fully fit within a word/fullword, there is no problem using nine decimal digits for a COMP/COMP-4/BINARY definition (without TRUNC(BIN)).
From the IBM Enterprise COBOL Version 4 Release 2 Performance Tuning paper, pp32-33:
The following shows the general performance considerations (from most
efficient to least efficient) for the number of digits of precision
for signed binary data items (using PICTURE S9(n) COMP) using
TRUNC(OPT):
n is from 1 to 8
for n from 1 to 4, arithmetic is done in halfword instructions where
possible for n from 5 to 8, arithmetic is done in fullword
instructions where possible
n is from 10 to 17 arithmetic is done in doubleword format
n is 9
fullword values are converted to doubleword format and then doubleword
arithmetic is used (this is SLOWER than any of the above)
n is 18 doubleword values are converted to a higher precision format
and then arithmetic is done using this higher precision (this is the
SLOWEST of all for binary data items)
There is a similar issue with TRUNC(STD). TRUNC(BIN) already has the built-in slowness for the number of digits 1-9, so is not further affected.
From the publicly available documentation :
TRUNC(OPT) is a performance option. When TRUNC(OPT) is in effect, the compiler assumes that data conforms to PICTURE specifications in USAGE BINARY receiving fields in MOVE statements and arithmetic expressions. The results are manipulated in the most optimal way, either truncating to the number of digits in the PICTURE clause, or to the size of the binary field in storage (halfword, fullword, or doubleword).
Tip: Use the TRUNC(OPT) option only if you are sure that the data being moved into the binary areas will not have a value with larger precision than that defined by the PICTURE clause for the binary item. Otherwise, unpredictable results could occur. This truncation is performed in the most efficient manner possible; therefore, the results are dependent on the particular code sequence generated. It is not possible to predict the truncation without seeing the code sequence generated for a particular statement.
Read the "Tip" very carefully and see what it means for your situation. (Hint : it means it does not make sense to ask the question you did because it literally says that "whatever happens, it was unpredictable" or iow "there is no explanation for what happens").
To make the compiler behaviour predictable, switch to either TRUNC(BIN) or TRUNC(STD). STD is good for standards compliance but bad for CPU usage, BIN is good for CPU usage but requires you to be a bit careful (because decimal truncation simply will not happen).
Why is the smallest value that can be stored a Byte(8bit) & not a Bit(1bit) in memory?
Even booleans are stored as Bytes. Will we ever bump the smallest number to 32 or 64bits like register's on the CPU?
EDIT: To clarify as many answers seemed confused about the nature of questing. This question is about why isn't a byte 7-bit, 1-bit, 32-bit, etc (not why lower bit primitives must fit within the hardware's byte at min). Is the 8-bit byte simply historical as some hardware has 10-bit bytes for example. Or is there a mathematical reason 8-bit is ideal vs say 10-bit for general processing?
The hardware is built to read data in blocks (bytes, later words and dwords). This provides greater efficiency, than accessing individual bits, and also offers more addressing range. So most data is aligned to at least byte boundary. There exist encodings that operate with bit sequences, rather than bytes, but they are quite rare.
Nowadays the data is most often aligned to dword (32-bits) boundary anyway. Moreover, some hardware (ARM, for example), can't access misaligned multibyte variables, i.e. 16-bit word can't "cross" dword boundary - exception will be thrown.
Because computers address memory at the byte level, so anything smaller than a byte is not addressable.
The underlying methods of processor access are limited to the size of the smallest usable register. On most architectures, that size is 8 bits. You can use smaller portions of these; for instance, C has the bitfield feature in structs that will allow combining fields that only need to be certain bit lengths. Access will still require that the whole byte be read.
Some older exotic architectures actually did have different a "word size." In these machines, 10 bits might be the common size.
Lastly, processors are almost always backwards compatible. Intel, for instance, has maintained complete instruction compatibility from the 386 on up. If you take a program compiled for the 386, it will still run on an i7 processor. Changing the word size would break compatibility. So while it is possible, no manufacturer will ever do it.
Assume that we have native language that consist of 2 character such as a , b
to distinguish two characters we need at least 1 bit for example 0 to represent char a and 1 to represent char b
so that if we count number of characters and special characters and symbols, there are 128 character and to distinguish one character from another, you need log2(128) = 7 bit and 8th bit for transmission
AFAIK, Currency type in Delphi Win32 depends on the processor floating point precision. Because of this I'm having rounding problems when comparing two Currency values, returning different results depending on the machine.
For now I'm using the SameValue function passing a Epsilon parameter = 0.009, because I only need 2 decimal digits precision.
Is there any better way to avoid this problem?
The Currency type in Delphi is a 64-bit integer scaled by 1/10,000; in other words, its smallest increment is equivalent to 0.0001. It is not susceptible to precision issues in the same way that floating point code is.
However, if you are multiplying your Currency numbers by floating-point types, or dividing your Currency values, the rounding does need to be worked out one way or the other. The FPU controls this mechanism (it's called the "control word"). The Math unit contains some procedures which control this mechanism: SetRoundMode in particular. You can see the effects in this program:
{$APPTYPE CONSOLE}
uses Math;
var
x: Currency;
y: Currency;
begin
SetRoundMode(rmTruncate);
x := 1;
x := x / 6;
SetRoundMode(rmNearest);
y := 1;
y := y / 6;
Writeln(x = y); // false
Writeln(x - y); // 0.0001; i.e. 0.1666 vs 0.1667
end.
It is possible that a third-party library you are using is setting the control word to a different value. You may want to set the control word (i.e. rounding mode) explicitly at the starting point of your important calculations.
Also, if your calculations ever transfer into plain floating point and then back into Currency, all bets are off - too hard to audit. Make sure all your calculations are in Currency.
No, Currency is not a floating point type. It is a fixed-precision decimal, implemented with integer storage. It can be compared exactly, and does not have the rounding issues of, say, Double. Therefore, if you are seeing inexact values in your Currency variables, the problem is not the Currency type itself, but what you are putting into it. Most likely, you have a floating-point calculation somewhere else in your code. Since you do not show that code, it's hard to be of more help on this question. But the solution, generally speaking, will be to round your floating point numbers to the correct precision before storing in the Currency variable, rather than doing an inexact comparison on the Currency variables.
Faster and safer way of comparing two currency values is certainly to map the variables to their internal Int64 representation:
function CompCurrency(var A,B: currency): Int64;
var A64: Int64 absolute A;
B64: Int64 absolute B;
begin
result := A64-B64;
end;
This will avoid any rounding error during comparison (working with *10000 integer values), and will be faster than the default FPU-based implementation (especially under 64 bit XE2 compiler).
See this article for additional information.
If your situation is like mine, you might find this approach helpful. I work mostly in payroll. If a business has say 3 departments and wants to charge the cost of an employee evenly among those three departments, there are a lot of times when there will be rounding issues.
What I have been doing is loop through the departments charging each one a third of the total cost and adding the cost charged to a subtotal (currency) variable. But when the loop variable equals the limit, rather than multiplying by the fraction, I subtract the subtotal variable from the total cost and put that in the last department. Since the journal entries that result from this process always have to balance, I believe that it has always worked.
See thread:
D7 / DUnit: all CheckEquals(Currency, Currency) tests suddenly fail ...
https://forums.codegear.com/thread.jspa?threadID=16288
It looks like a change on our development workstations caused Currency comparision to fail. We have not found the root cause, but on two computers running Windows 2000 SP4, and independent of the version of gds32.dll (InterBase 7.5.1 or 2007) and Delphi (7 and 2009), this line
TIBDataBase.Create(nil);
changes the value of to 8087 control word from $1372 to $1272 now.
And all Currency comparisions in unit tests will fail with funny messages like
Expected: <12.34> - Found: <12.34>
The gds32.dll has not been modified, so I guess that there is a dependency in this library to a third party dll which modifies the control word.
To avoid possible issues with currency rounding in Delphi use 4 decimal places.
This will ensure that you never having rounding issues when doing calcualtions with very small amounts.
"Been there. Done That. Written the unit tests."