Delphi Warning W1073 Combining signed type and unsigned 64-bit type - treated as an unsigned type - delphi

I get the subject warning on the following line of code;
SelectedFilesSize := SelectedFilesSize +
UInt64(IdList.GetPropertyValue(TShellColumns.Size)) *
ifthen(Selected, 1, -1);
Specifically, the IDE highlights the third line.
SelectedFilesSize is declared as UInt64.
The code appears to work when I run it; if I select an item, its file size is added to the total, if I deselect a file its size is subtracted.
I know I can suppress this warning with {$WARN COMBINING_SIGNED_UNSIGNED64 OFF}.
Can someone explain? Will there be an unforeseen impact if SelectedFilesSize gets huge? Or an impact on a specific target platform?
Delphi 10.3, Win32 and Win64 targets

This will work here, but the warning is right.
If you multiply a UInt64 with -1, you are actually multiplying it with $FFFFFFFFFFFFFFFF. The final result will be a 128 bit value, but the lower 64 bits will be the same as for a signed multiplication (that is also why the code generator often produces an imul opcode, even for unsiged multiplication: the lower bits will be correct, just the — unused — higher bits won't be). The upper 64 bits won't be used anyway, so they don't matter.
If you add that (actually negative) value to another UInt64 (e.g. SelectedFilesSize), the 64 bit result will be correct again. The CPU does not discriminate between positive or negative values when adding. The resulting CPU flags (carry, overflow) will indicate overflow, but if you ignore that by not using range or overflow checks, your code will be fine.
Your code will likely produce a runtime error if range or overflow checks are on, though.
In other words, this works because any excess upper bit — the 64th bit and above — can be ignored. Otherwise, the values would be wrong. See example.
Example
Say your IdList.GetPropertyValue(TShellColumns.Size) is 420. Then you are performing:
$00000000000001A4 * $FFFFFFFFFFFFFFFF = $00000000000001A3FFFFFFFFFFFFFF5C
This is a huge but positive number, but fortunately the lower 64 bits ($FFFFFFFFFFFFFF5C) can be interpreted as -420 (a really negative value in 128 bit would be $FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF5C or -420).
Now say your SelectedFileSize is 100000 (or hex $00000000000186A0). Then you get:
$00000000000186A0 + $FFFFFFFFFFFFFF5C = $00000000000184FC
(or actually $100000000000184FC, but the top bit -- the carry -- is ignored).
$00000000000184FC is 99580 in decimal, so exactly the value you wanted.

Related

Is it possible to clear the FPU?

I'm using Delphi XE6 to perform a complicated floating point calculation. I realize the limitations of floating point numbers so understand the inaccuracies inherent in FP numbers. However this particular case, I always get 1 of 2 different values at the end of the calculation.
The first value and after a while (I haven't figured out why and when), it flips to the second value, and then I can't get the first value again unless I restart my application. I can't really be more specific as the calculation is very complicated. I could almost understand if the value was somewhat random, but just 2 different states is a little confusing. This only happens in the 32-bit compiler, the 64 bit compiler gives one single answer no matter how many times I try it. This number is different from the 2 from the 32-bit calculation, but I understand why that is happening and I'm fine with it. I need consistency, not total accuracy.
My one suspicion is that perhaps the FPU is being left in a state after some calculation that affects subsequent calculations, hence my question about clearing all registers and FPU stack to level out the playing field. I'd call this CLEARFPU before I start of the calculation.
After some more investigation I realized I was looking in the wrong place. What you see is not what you get with floating point numbers. I was looking at the string representation of the numbers and thinking here are 4 numbers going into a calculation ALL EQUAL and the result is different. Turns out the numbers only seemed to be the same. I started logging the hex equivalent of the numbers, worked my way back and found an external dll used for matrix multiplication the cause of the error. I replaced the matrix multiplication with a routine written in Delphi and all is well.
Floating point calculations are deterministic. The inputs are the input data and the floating point control word. With the same input, the same calculation will yield repeatable output.
If you have unpredictable results, then there will be a reason for it. Either the input data or the floating point control word is varying. You have to diagnose what that reason for that is. Until you understand the problem fully, you should not be looking for a problem. Do not attempt to apply a sticking plaster without understanding the disease.
So the next step is to isolate and reproduce the problem in a simple piece of code. Once you can reproduce the issue you can solve the problem.
Possible explanations include using uninitialized data, or external code modifying the floating point control word. But there could be other reasons.
Uninitialized data is plausible. Perhaps more likely is that some external code is modifying the floating point control word. Instrument your code to log the floating point control word at various stages of execution, to see if it ever changes unexpectedly.
You've probably been bitten by combination of optimization and excess x87 FPU precision resulting in the same bit of floating-point code in your source code being duplicated with different assembly code implementations with different rounding behaviour.
The problem with x87 FPU math
The basic problem is that while x87 FPU the supports 32-bit, 64-bit and 80-bit floating-point value, it only has 80-bit registers and the precision of operations is determined by the state of the bits in the floating point control word, not the instruction used. Changing the rounding bits is expensive, so most compilers don't, and so all floating point operations end being be performed at the same precision regardless of the data types involved.
So if the compiler sets the FPU to use 80-bit rounding and you add three 64-bit floating point variables, the code generated will often add the first two variables keeping the unrounded result in a 80-bit FPU register. It would then add the third 64-bit variable to 80-bit value in the register resulting in another unrounded 80-bit value in a FPU register. This can result in a different value being calculated than if the result was rounded to 64-bit precision after each step.
If that resulting value is then stored in a 64-bit floating-point variable then the compiler might write it to memory, rounding it to 64 bits at this point. But if the value is used in later floating point calculations then the compiler might keep it in a register instead. This means what rounding occurs at this point depends on the optimizations the compiler performs. The more its able to keep values in a 80-bit FPU register for speed, the more the result will differ from what you'd get if all floating point operation were rounded according to the size of actual floating point types used in the code.
Why SSE floating-point math is better
With 64-bit code the x87 FPU isn't normally used, instead equivalent scalar SSE instructions are used. With these instructions the precision of the operation used is determined by the instruction used. So with the adding three numbers example, the compiler would emit instructions that added the numbers using 64-bit precision. It doesn't matter if the result gets stored in memory or stays in register, the value remains the same, so optimization doesn't affect the result.
How optimization can turn deterministic FP code into non-deterministic FP code
So far this would explain why you'd get a different result with 32-bit code and 64-bit code, but it doesn't explain why you can get a different result with the same 32-bit code. The problem here is that optimizations can change the your code in surprising ways. One thing the compiler can do is duplicate code for various reasons, and this can cause the same floating point code being executed in different code paths with different optimizations applied.
Since optimization can affect floating point results this can mean the different code paths can give different results even though there's only one code path in the source code. If the code path chosen at run time is non-deterministic then this can cause non-deterministic results even when the in the source code the result isn't dependent on any non-deterministic factor.
An example
So for example, consider this loop. It performs a long running calculation, so every few seconds it prints a message letting the user know how many iterations have been completed so far. At the end of the loop there's simple summation performed using floating-point arithmetic. While there's non-deterministic factor in the loop, the floating-point operation isn't dependent on it. It's always performed regardless of whether progress updated is printed or not.
while ... do
begin
...
if TimerProgress() then
begin
PrintProgress(count);
count := 0
end
else
count := count + 1;
sum := sum + value
end
As optimization the compiler might move the last summing statement into the end of both blocks of the if statement. This lets both blocks finish by jumping back to the start of the loop, saving a jump instruction. Otherwise one of the blocks has to end with a jump to the summing statement.
This transforms the code into this:
while ... do
begin
...
if TimerProgress() then
begin
PrintProgress(count);
count := 0;
sum := sum + value
end
else
begin
count := count + 1;
sum := sum + value
end
end
This can result in the two summations being optimized differently. It may be in one code path the variable sum can be kept in a register, but in the other path its forced out in to memory. If x87 floating point instructions are used here this can cause sum to be rounded differently depending on a non-deterministic factor: whether or not its time to print the progress update.
Possible solutions
Whatever the source of your problem, clearing the FPU state isn't going to solve it. The fact that the 64-bit version works, provides an possible solution, using SSE math instead x87 math. I don't know if Delphi supports this, but it's common feature of C compilers. It's very hard and expensive to make x87 based floating-point math conforming to the C standard, so many C compilers support using SSE math instead.
Unfortunately, a quick search of the Internet suggests the Delphi compiler doesn't have option for using SSE floating-point math in 32-bit code. In that case your options would be more limited. You can try disabling optimization, that should prevent the compiler from creating differently optimized versions of the same code. You could also try to changing the rounding precision in the x87 floating-point control word. By default it uses 80-bit precision, but all your floating point variables are 64-bit then changing the FPU to use 64-bit precision should significantly reduce the effect optimization has on rounding.
To do the later you can probably use the Set8087CW procedure MBo mentioned, or maybe System.Math.SetPrecisionMode.

32 bit multiplication on 24 bit ALU

I want to port a 32 by 32 bit unsigned multiplication on a 24-bit dsp (it's a Linear Congruential Generator, so I'm not allowed to truncate, also I don't want to replace yet the current LCG with a 24 bit one). The available data types are 24 and 48 bit ints.
Only the last 32 LSB are needed. Do you know any hacks to implement this in fewer multiplies, masks and shifts than the usual way?
The line looks like this:
//val is an int(32 bit)
val = (1664525 * val) + 1013904223;
An outline would be (in my current compiler style):
static uint48_t val = SEED;
...
val = 0xFFFFFFFFUL & ((1664525UL * val) + 1013904223UL);
and hopefully the compiler will recognise:
it can use a multiply and accumulate command
it only needs a reduced multiply algorithim due to the "high word" of the constant being zero
the AND could be effected by resetting the upper bits or multiplying a constant and restoring
...other stuff depends on your {mystery dsp} target
Note
if you scale up the coefficients by 2^16, you can get truncation for free, but due to lack of info
you will have to explore/decide if it is better overall.
(This is more an elaboration why two multiplications 24×24→n, 31<n are enough for 32×32→min(n, 40).)
The question discloses amazingly little about the capabilities to build a method
32×21→32 in fewer [24×24] multiplies, masks and shifts than the usual way on:
24 and 48 bit ints & DSP (I read high throughput, non-high latency 24×24→48).
As far as there indeed is a 24×24→48 multiply (or even 24×24+56→56 MAC) and one factor is less than 24 bits, the question is pointless, a second multiply being the compelling solution.
The usual composition of a 24<n<48×24<m<48→24<p multiply from 24×24→48 uses three of the latter; a compiler should know as well as a coder that "the fourth multiply" would yield bits with a significance/position exceeding the combined lengths of the lower parts of the factors.
So, is it possible to generate "the long product" using just a second 24×24→48?
Let the (bytes of the) factors be w_xyz and W_XYZ, respectively; the underscores suggesting "the Ws" being the lower significance bits in the higher significance words/ints if interpreted as 24bit ints. The first 24×24→48 gives the sum of
  zX
 yXzY
xXyYzZ
 xYyZ
  xZ, what is needed (fat) is
 wZ +
 zW.
This can be computed using one combined multiplication of
((w<<16)|(z & 0xff)) × ((W<<16)|(Z & 0xff)). (Never mind the 17th bit of wZ+zW "running" into wW.)
(In the first revision of this answer, I foolishly produced wZ and zW separately - their sum is wanted in the end, anyway.)
(Annoyingly, this is about all you can do for 24×24→24 as a base operation too - beyond this "combining multiplication", you need four instead of one.)
Another angle to explore is choosing a different PRNG.
It may have to be >24 bits (tell!).
On a 24 bit machine, XorShift* (or even XorShift+) 48/32 seems worth a look.

How will 64 bit variable be referenced in a 32 bit process?

I have a 64 bit kernel and i run 32 bit processes in userland.In the user process code ,if i declare a 64 bit variable ,how will it be referred.Will it incur 2 memory reads.?
basically the scenario is:
I need to use a 64 bit mask in my user process.
Approach 1 :
-> Use a u64bits variable.
Approach
-> Use a array of 2 32 bit variables.
First off: the kernel has no bearing on the answer to this question.
Second, I assume this is x86 you're talking about. Where possible, the compiler will place 64-bit values across 2 32-bit registers. For example, if you return a uint64_t from a function, the low 32 bits will be stored in the eax register, and the high bits will be in edx.
The compiler will generally do the right thing for performance and correctness: using an array will likely just confuse it and lead to worse results.
By the way, x86-64 CPUs will normally perform reads of 2 adjacent 32-bit words at the same speed as a single 64-bit read. The advantages of 64-bit mode are that arithmetic can be done directly on 64-bit values (1 64x64 multiplication instruction vs 3-4 32x32 instructions), there is much more space available in registers (16 registers instead of 8, registers are twice as wide), and of course the larger possible virtual address space.

Why is the smallest value that can be stored is a Byte(8bit) & not a Bit(1bit)?

Why is the smallest value that can be stored a Byte(8bit) & not a Bit(1bit) in memory?
Even booleans are stored as Bytes. Will we ever bump the smallest number to 32 or 64bits like register's on the CPU?
EDIT: To clarify as many answers seemed confused about the nature of questing. This question is about why isn't a byte 7-bit, 1-bit, 32-bit, etc (not why lower bit primitives must fit within the hardware's byte at min). Is the 8-bit byte simply historical as some hardware has 10-bit bytes for example. Or is there a mathematical reason 8-bit is ideal vs say 10-bit for general processing?
The hardware is built to read data in blocks (bytes, later words and dwords). This provides greater efficiency, than accessing individual bits, and also offers more addressing range. So most data is aligned to at least byte boundary. There exist encodings that operate with bit sequences, rather than bytes, but they are quite rare.
Nowadays the data is most often aligned to dword (32-bits) boundary anyway. Moreover, some hardware (ARM, for example), can't access misaligned multibyte variables, i.e. 16-bit word can't "cross" dword boundary - exception will be thrown.
Because computers address memory at the byte level, so anything smaller than a byte is not addressable.
The underlying methods of processor access are limited to the size of the smallest usable register. On most architectures, that size is 8 bits. You can use smaller portions of these; for instance, C has the bitfield feature in structs that will allow combining fields that only need to be certain bit lengths. Access will still require that the whole byte be read.
Some older exotic architectures actually did have different a "word size." In these machines, 10 bits might be the common size.
Lastly, processors are almost always backwards compatible. Intel, for instance, has maintained complete instruction compatibility from the 386 on up. If you take a program compiled for the 386, it will still run on an i7 processor. Changing the word size would break compatibility. So while it is possible, no manufacturer will ever do it.
Assume that we have native language that consist of 2 character such as a , b
to distinguish two characters we need at least 1 bit for example 0 to represent char a and 1 to represent char b
so that if we count number of characters and special characters and symbols, there are 128 character and to distinguish one character from another, you need log2(128) = 7 bit and 8th bit for transmission

Inverse Mappings

Saying right now: Yes, this is homework. I'm not asking for an answer, but I would love any help into a general direction to look at this problem at. I've been working on it now for hours and have not made any real progress.
Can a function, with a well defined inverse, be implemented to map 32 bit integers to 64 bit integers. Do all functions from 32bit to 64bit integers have well defined inverses?
Of course not.
Take the identity function for example. All 32-bit values have an identity in the 64-bit value space (just use 0 in the top 32 bits, using only the bottom 32 bits for the value). However, any 64-bit value where the top 32 bits is not 0, will not have a corresponding value in the 32-bit value space.
The above is a layman's explanation, and is probably not rigorous enough as a homework solution (as intended). You'd do well to read up on the pigeonhole principle.

Resources