Comparison Efficiency - comparison

What is generally faster:
if (num >= 10)
or:
if (!(num < 10))

The compiler will most likely optimize that sort of thing. Don't worry about it, just code for clarity in this case.
Assembly languages often have operations for >= and <= that are the same number of steps as < and >. For instance, with a Motorola 68k, if you want to compare the data registers %d0 and %d1 and branch if %d0 is greater than or equal to %d1, you would say something like:
cmp %d0, %d1 // compare %d0 and %d1, storing the result
// in the condition code registers.
bge labelname // Branch to the given label name if the comparison
// yielded "greater than or equal to" (hence bge)
It's a common mistake to think that a >= b means the computer will perform two operations instead of one because of that "or" in "greater than or equal to".

Any decent compiler will optimize those two statements to exactly the same underlying code. In fact, it will most likely generate exactly the same code for:
if (!(!(!(!(!(!(!(num < 10))))))))
I would opt for the first of yours just because its intent seems much clearer (mildly clearer than your second choice, massively clearer than that monstrosity I posted above). I tend to think in terms of how I would read it. Think of the two sentences:
if number is greater than or equal to ten.
if it's not the case that number is less than ten.
I believe the first one to be clearer.
In fact, just testing with "gcc -s" to get the assembler output, both statements generate the following code:
cmpl $9,-8(%ebp) ; compare value with 9
jle .L3 ; branch if 9 or less.
I believe you're wasting your time looking at micro-optimisations like this - you'd be far more efficient looking at things like algorithm selection. There's likely to be a much greater return on investment there.

In general any speed difference won't matter a great deal, but they don't necessarily mean exactly the same thing.
In many languages, comparing the floating point value NaN returns false for all comparisons, so if num = NaN, the first is false and the second true.
#include <iostream>
#include <limits>
int main ( ) {
using namespace std;
double num = numeric_limits<double>::quiet_NaN();
cout << boolalpha;
cout << "( num >= 10 ) " << ( num >= 10 ) << endl;
cout << "( ! ( num < 10 ) ) " << ( ! ( num < 10 ) ) << endl;
cout << endl;
}
outputs
( num >= 10 ) false
( ! ( num < 10 ) ) true
So the compiler can use a single instruction to compare num and the value 10 in the first case, but in the second may issue a second instruction to invert the result of the comparison. ( or it may just use a branch if zero rather than branch if non-zero, you can't say in general )
Other languages and compilers will vary, and for types where they really have the same semantics the code emitted might well be identical.

Related

Is there a way to assert that the first digit (most significant) of number is a particular digit?

I would like to assert that the most significant digit of a number is a particular value, but I don't actually know the length of the number. If it was the least significant digit, I know I could use the python mod (%) to check for it. But with an unknown number of digits, I'm unsure of how I could check this in z3.
For example, I may know that the left most digit is a 9, such as 9x, or 9xx, or 9xxx etc.
Thanks so much in advance
The generic way to do this would be to convert to a string and check that the first character matches:
from z3 import *
s = Solver()
n = Int('n')
s.add(SubString(IntToStr(n), 0, 1) == "9")
r = s.check()
if r == sat:
m = s.model()
print("n =", m[n])
else:
print("Solver said:", r)
This prints:
n = 9
Note that IntToStr expects its argument to be non-negative, so if you need to support negative numbers, you'll have to write extra code to accommodate for that. See https://smtlib.cs.uiowa.edu/theories-UnicodeStrings.shtml for details.
Aside While this will accomplish what you want in its generality, it may not be the most efficient way to encode this constraint. Since it goes through strings, the constraints generated might cause performance issues. If you have an upper limit on your number, it might be more efficient to code it explicitly. For instance, if you know your number is less than a 1000, I'd code it as (pseudocode):
n == 9 || n >= 90 && n <= 99 || n >= 900 && n <= 999
etc. until you have the range you needed covered. This would lead to much simpler constraints and perform a lot better in general. Note that this'll work even if you don't know the exact length, but have an upper bound on it. But of course, it all depends on what you are trying to achieve and what else you know about the number itself.

Loading/Storing to XMFLOAT4 faster than using XMVECTOR?

I'm going through the DirectX Math/XNA Math library, and I got curious when I read about the alignment requirements for XMVECTOR (Now DirectX::XMVECTOR), and how it is expected of you to use XMFLOAT* for members instead, using XMLoad* and XMStore* when performing mathematical operations. I was specifically curious about the tradeoffs, so I did an experiment, as I'm sure many others have, and tested to see exactly how much you lose having to load and store the vectors for each operation. This is the resulting code:
#include <Windows.h>
#include <chrono>
#include <cstdint>
#include <DirectXMath.h>
#include <iostream>
using std::chrono::high_resolution_clock;
#define TEST_COUNT 1000000000l
int main(void)
{
DirectX::XMVECTOR v1 = DirectX::XMVectorSet(1, 2, 3, 4);
DirectX::XMVECTOR v2 = DirectX::XMVectorSet(2, 3, 4, 5);
DirectX::XMFLOAT4 x{ 1, 2, 3, 4 };
DirectX::XMFLOAT4 y{ 2, 3, 4, 5 };
std::chrono::system_clock::time_point start, end;
std::chrono::milliseconds duration;
// Test with just the XMVECTOR
start = high_resolution_clock::now();
for (uint64_t i = 0; i < TEST_COUNT; i++)
{
v1 = DirectX::XMVectorAdd(v1, v2);
}
end = high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
DirectX::XMFLOAT4 z;
DirectX::XMStoreFloat4(&z, v1);
std::cout << std::endl << "z = " << z.x << ", " << z.y << ", " << z.z << std::endl;
std::cout << duration.count() << " milliseconds" << std::endl;
// Now try with load/store
start = high_resolution_clock::now();
for (uint64_t i = 0; i < TEST_COUNT; i++)
{
v1 = DirectX::XMLoadFloat4(&x);
v2 = DirectX::XMLoadFloat4(&y);
v1 = DirectX::XMVectorAdd(v1, v2);
DirectX::XMStoreFloat4(&x, v1);
}
end = high_resolution_clock::now();
duration = std::chrono::duration_cast<std::chrono::milliseconds>(end - start);
std::cout << std::endl << "x = " << x.x << ", " << x.y << ", " << x.z << std::endl;
std::cout << duration.count() << " milliseconds" << std::endl;
}
Running a debug build yields the output:
z = 3.35544e+007, 6.71089e+007, 6.71089e+007
25817 milliseconds
x = 3.35544e+007, 6.71089e+007, 6.71089e+007
84344 milliseconds
Okay, so about thrice as slow, but does anyone really take perf tests on debug builds seriously? Here are the results when I do a release build:
z = 3.35544e+007, 6.71089e+007, 6.71089e+007
1980 milliseconds
x = 3.35544e+007, 6.71089e+007, 6.71089e+007
670 milliseconds
Like magic, XMFLOAT4 runs almost three times faster! Somehow the tables have turned. Looking at the code, this makes no sense to me; the second part runs a superset of the commands that the first part runs! There must be something going wrong, or something I am not taking into account. It is difficult to believe that the compiler was able to optimize the second part nine-fold over the much simpler, and theoretically more efficient first part. The only reasonable explanations I have involve either (1) cache behavior, (2) some crazy out of order execution that XMVECTOR can't take advantage of, (3) the compiler is making some insane optimizations, or (4) using XMVECTOR directly has some implicit inefficiency that was able to be optimized away when using XMFLOAT4. That is, the default way the compiler loads and stores XMVECTORs from memory is less efficient than XMLoad* and XMStore*. I attempted to inspect the disassembly, but I'm not all that familiar with X86 and/or SSE2 and Visual Studio does some crazy optimizations making it difficult to follow along with the source code. I also tried the Visual Studio performance analysis tool, but that didn't help as I can't figure out how to make it show the disassembly instead of the code. The only useful information I get out of that is that the first call to XMVectorAdd accounts for ~48.6% of all cycles while the second call to XMVectorAdd accounts for ~4.4% of all cycles.
EDIT:
After some more debugging, here is the assembly for the code that gets run inside of the loop. For the first part:
002912E0 movups xmm1,xmmword ptr [esp+18h] <-- HERE
002912E5 add ecx,1
002912E8 movaps xmm0,xmm2 <-- HERE
002912EB adc esi,0
002912EE addps xmm0,xmm1
002912F1 movups xmmword ptr [esp+18h],xmm0 <-- HERE
002912F6 jne main+60h (0291300h)
002912F8 cmp ecx,3B9ACA00h
002912FE jb main+40h (02912E0h)
And for the second part:
00291400 add ecx,1
00291403 addps xmm0,xmm1
00291406 adc esi,0
00291409 jne main+173h (0291413h)
0029140B cmp ecx,3B9ACA00h
00291411 jb main+160h (0291400h)
Note that these two loops are indeed nearly identical. The only difference is that the first for loop appears to be the one doing the loading and storing! It would appear as though Visual Studio made a ton of optimizations since x and y were on the stack. Changing them both to be on the heap (thus the writes must happen), and the machine code is now identical. Is this generally the case? Is there really no negative side effect to using the storage classes? Other than the fully optimized versions I suppose.
If you define
DirectX::XMVECTOR v3 = DirectX::XMVectorSet(2, 3, 4, 5);
and use v3 instead v1 as a result:
...
for (uint64_t i = 0; i < TEST_COUNT; i++)
{
v3 = DirectX::XMVectorAdd(v1, v2);
}
you got code faster then 2-nd part code using XMLoadFloat4 and XMStoreFloat4
Firstly, don't use Visual Studio's "high-resolution clock" for perf timing. You should use QueryPerformanceCounter instead. See Connect.
SIMD performance is difficult to measure in these micro tests because the overhead of loading up vector data can often dominate with such trivial ALU usage. You really need to do something substantial with the data to see the benefits. Also keep in mind that depending on your compiler settings, the compiler itself may be using the 'scalar' SIMD functionality or even auto-vectoring such trivial code loops.
You are also seeing some issues with the way you are generating your test data. You should create something larger than a single vector on the heap and use that as your source/dest.
PS: The best way to create 'static' XMVECTOR data is to use the XMVECTORF32 type.
static const DirectX::XMVECTORF32 v1 = { 1, 2, 3, 4 };
Note that if you want to have the load/save conversions between XMVECTOR and XMFLOATx to be "automatic", take a look at SimpleMath in the DirectX Tool Kit. You just use types like SimpleMath::Vector4 in your data structures, and the implicit conversion operators take care of calling XMLoadFloat4 / XMStoreFloat4 for you.

Lua: Rounding numbers and then truncate

Which is the best efficient way to round up a number and then truncate it (remove decimal places after rounding up)?
for example if decimal is above 0.5 (that is, 0.6, 0.7, and so on), I want to round up and then truncate (case 1). Otherwise, I would like to truncate (case 2)
for example:
232.98266601563 => after rounding and truncate = 233 (case 1)
232.49445450000 => after rounding and truncate = 232 (case 2)
232.50000000000 => after rounding and truncate = 232 (case 2)
There is no build-in math.round() function in Lua, but you can do the following:
print(math.floor(a+0.5)).
A trick that is useful for rounding at decimal digits other than whole integers is to pass the value through formatted ASCII text, and use the %f format string to specify the rounding desired. For example
mils = tonumber(string.format("%.3f", exact))
will round the arbitrary value in exact to a multiple of 0.001.
A similar result can be had with scaling before and after using one of math.floor() or math.ceil(), but getting the details right according to your expectations surrounding the treatment of edge cases can be tricky. Not that this isn't an issue with string.format(), but a lot of work has gone into making it produce "expected" results.
Rounding to a multiple of something other than a power of ten will still require scaling, and still has all the tricky edge cases. One approach that is simple to express and has stable behavior is to write
function round(exact, quantum)
local quant,frac = math.modf(exact/quantum)
return quantum * (quant + (frac > 0.5 and 1 or 0))
end
and tweak the exact condition on frac (and possibly the sign of exact) to get the edge cases you wanted.
To also support negative numbers, use this:
function round(x)
return x>=0 and math.floor(x+0.5) or math.ceil(x-0.5)
end
If your Lua uses double precision IEC-559 (aka IEEE-754) floats, as most do, and your numbers are relatively small (the method is guaranteed to work for inputs between -251 and 251), the following efficient code will perform rounding using your FPU's current rounding mode, which is usually round to nearest, ties to even:
local function round(num)
return num + (2^52 + 2^51) - (2^52 + 2^51)
end
(Note that the numbers in parentheses are calculated at compilation time; they don't affect runtime).
For example, when the FPU is set to round to nearest or even, this unit test prints "All tests passed":
local function testnum(num, expected)
if round(num) ~= expected then
error(("Failure rounding %.17g, expected %.17g, actual %.17g")
:format(num+0, expected+0, round(num)+0))
end
end
local function test(num, expected)
testnum(num, expected)
testnum(-num, -expected)
end
test(0, 0)
test(0.2, 0)
test(0.4, 0)
-- Most rounding algorithms you find on the net, including Ola M's answer,
-- fail this one:
test(0.49999999999999994, 0)
-- Ties are rounded to the nearest even number, rather than always up:
test(0.5, 0)
test(0.5000000000000001, 1)
test(1.4999999999999998, 1)
test(1.5, 2)
test(2.5, 2)
test(3.5, 4)
test(2^51-0.5, 2^51)
test(2^51-0.75, 2^51-1)
test(2^51-1.25, 2^51-1)
test(2^51-1.5, 2^51-2)
print("All tests passed")
Here's another (less efficient, of course) algorithm that performs the same FPU rounding but works for all numbers:
local function round(num)
local ofs = 2^52
if math.abs(num) > ofs then
return num
end
return num < 0 and num - ofs + ofs or num + ofs - ofs
end
Here's one to round to an arbitrary number of digits (0 if not defined):
function round(x, n)
n = math.pow(10, n or 0)
x = x * n
if x >= 0 then x = math.floor(x + 0.5) else x = math.ceil(x - 0.5) end
return x / n
end
For bad rounding (cutting the end off):
function round(number)
return number - (number % 1)
end
Well, if you want, you can expand this for good rounding.
function round(number)
if (number - (number % 0.1)) - (number - (number % 1)) < 0.5 then
number = number - (number % 1)
else
number = (number - (number % 1)) + 1
end
return number
end
print(round(3.1))
print(round(math.pi))
print(round(42))
print(round(4.5))
print(round(4.6))
Expected results:
3, 3, 42, 5, 5
I like the response above by RBerteig: mils = tonumber(string.format("%.3f", exact)).
Expanded it to a function call and added a precision value.
function round(number, precision)
local fmtStr = string.format('%%0.%sf',precision)
number = string.format(fmtStr,number)
return number
end
Should be math.ceil(a-0.5) to correctly handle half-integer numbers
Here is a flexible function to round to different number of places. I tested it with negative numbers, big numbers, small numbers, and all manner of edge cases, and it is useful and reliable:
function Round(num, dp)
--[[
round a number to so-many decimal of places, which can be negative,
e.g. -1 places rounds to 10's,
examples
173.2562 rounded to 0 dps is 173.0
173.2562 rounded to 2 dps is 173.26
173.2562 rounded to -1 dps is 170.0
]]--
local mult = 10^(dp or 0)
return math.floor(num * mult + 0.5)/mult
end
For rounding to a given amount of decimals (which can also be negative), I'd suggest the following solution that is combined from the findings already presented as answers, especially the inspiring one given by Pedro Gimeno. I tested a few corner cases I'm interested in but cannot claim that this makes this function 100% reliable:
function round(number, decimals)
local scale = 10^decimals
local c = 2^52 + 2^51
return ((number * scale + c ) - c) / scale
end
These cases illustrate the round-halfway-to-even property (which should be the default on most machines):
assert(round(0.5, 0) == 0)
assert(round(-0.5, 0) == 0)
assert(round(1.5, 0) == 2)
assert(round(-1.5, 0) == -2)
assert(round(0.05, 1) == 0)
assert(round(-0.05, 1) == 0)
assert(round(0.15, 1) == 0.2)
assert(round(-0.15, 1) == -0.2)
I'm aware that my answer doesn't handle the third case of the actual question, but in favor of being IEEE-754 compliant, my approach makes sense. So I'd expect that the results depend on the current rounding mode set in the FPU with FE_TONEAREST being the default. And that's why it seems high likely that after setting FE_TOWARDZERO (however you can do that in Lua) this solution would return exactly the results that were asked for in the question.
Try using math.ceil(number + 0.5) This is according to this Wikipedia page. If I'm correct, this is only rounding positive integers. you need to do math.floor(number - 0.5) for negatives.
If it's useful to anyone, i've hash-ed out a generic version of LUA's logic, but this time for truncate() :
**emphasized text pre-apologize for not knowing lua-syntax, so this is in AWK/lua mixture, but hopefully it should be intuitive enough
-- due to lua-magic alrdy in 2^(52-to-53) zone,
-- has to use a more coarse-grained delta than
-- true IEEE754 double machineepsilon of 2^-52
function trunc_lua(x,s) {
return \
((x*(s=(-1)^(x<-x)) \
- 2^-1 + 2^-50 \ -- can also be written as
\ -- 2^-50-5^0/2
- _LUAMAGIC \ -- if u like symmetric
\ -- code for fun
+ _LUAMAGIC \
) *(s) };
It's essentially the same concept as rounding, but force-processing all inputs in positive-value zone, with a -1*(0.5-delta) offset. The smallest delta i could attain is 2^-52 ~ 2.222e-16.
The lua-magic values must come after all those pre-processing steps, else precision-loss may occur. And finally, restore original sign of input.
The 2 "multiplies" are simply low-overhead sign-flipping. sign-flips 4 times for originally negative values (2 manual flips and round-trip to end of mantissa), while any x >= 0, including that of -0.0, only flips twice. All tertiary function calling, float division, and integer modulus is avoided, with only 1 conditional check for x<0.
usage notes :
(1) doesn't perform checks on input for invalid or malicious payload,
(2) doesn't use quickly check for zero,
(3) doesn't check for extreme inputs that may render this logic moot, and
(4) doesn't attempt to pretty format the value
if not exist math.round
function math.round(x, n)
return tonumber(string.format("%." .. n .. "f", x))
end

Is there a language construct in F# for testing if a number is between two other numbers (in a range)?

I am looking for a more succinct F# equivalent of:
myNumber >= 2 && myNumber <= 4
I imagine something like
myNumber >=< (2, 4)
Is there some kind of operation like this?
There is no native operator, but you could define your own one.
let inline (>=<) a (b,c) = a >= b && a<= c
John's answer is exactly what you asked for, and the most practical solution. But this got me wondering if one could define operator(s) to enable a syntax closer to normal mathematical notation, i.e., a <= b <= c.
Here's one such solution:
let inline (<=.) left middle = (left <= middle, middle)
let inline (.<=) (leftResult, middle) right = leftResult && (middle <= right)
let inline (.<=.) middleLeft middleRight = (middleLeft .<= middleRight, middleRight)
1 <=. 3 .<=. 5 .<= 9 // true
1 <=. 10 .<= 5 // false
A few comments on this:
I used the . character to indicate the "middle" of the expression
. was a very deliberate choice, and is not easily changeable to some other character you like better (e.g. if you perhaps like the look of 1 <=# 3 #<= 5 better). The F# compiler changes the associativity and/or precedence of an operator based on the operator symbol's first character. We want standard left-to-right evaluation/short-circuiting, and . enables this.
A 3-number comparison is optimized away completely, but a 4+ number comparison results in CIL that allocates tuples and does various other business that isn't strictly necessary:
Is there some kind of operation like this?
Great question! The answer is "no", there isn't, but I wish there was.
Latkin's answer is nice, but it doesn't short-circuit evaluate. So if the first test fails the remaining subexpressions still get evaluated, even though their results are irrelevant.
FWIW, in Mathematica you can do 1<x<2 just like mathematics.

Lua base converter

I need a base converter function for Lua. I need to convert from base 10 to base 2,3,4,5,6,7,8,9,10,11...36 how can i to this?
In the string to number direction, the function tonumber() takes an optional second argument that specifies the base to use, which may range from 2 to 36 with the obvious meaning for digits in bases greater than 10.
In the number to string direction, this can be done slightly more efficiently than Nikolaus's answer by something like this:
local floor,insert = math.floor, table.insert
function basen(n,b)
n = floor(n)
if not b or b == 10 then return tostring(n) end
local digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"
local t = {}
local sign = ""
if n < 0 then
sign = "-"
n = -n
end
repeat
local d = (n % b) + 1
n = floor(n / b)
insert(t, 1, digits:sub(d,d))
until n == 0
return sign .. table.concat(t,"")
end
This creates fewer garbage strings to collect by using table.concat() instead of repeated calls to the string concatenation operator ... Although it makes little practical difference for strings this small, this idiom should be learned because otherwise building a buffer in a loop with the concatenation operator will actually tend to O(n2) performance while table.concat() has been designed to do substantially better.
There is an unanswered question as to whether it is more efficient to push the digits on a stack in the table t with calls to table.insert(t,1,digit), or to append them to the end with t[#t+1]=digit, followed by a call to string.reverse() to put the digits in the right order. I'll leave the benchmarking to the student. Note that although the code I pasted here does run and appears to get correct answers, there may other opportunities to tune it further.
For example, the common case of base 10 is culled off and handled with the built in tostring() function. But similar culls can be done for bases 8 and 16 which have conversion specifiers for string.format() ("%o" and "%x", respectively).
Also, neither Nikolaus's solution nor mine handle non-integers particularly well. I emphasize that here by forcing the value n to an integer with math.floor() at the beginning.
Correctly converting a general floating point value to any base (even base 10) is fraught with subtleties, which I leave as an exercise to the reader.
you can use a loop to convert an integer into a string containting the required base. for bases below 10 use the following code, if you need a base larger than that you need to add a line that mapps the result of x % base to a character (usign an array for example)
x = 1234
r = ""
base = 8
while x > 0 do
r = "" .. (x % base ) .. r
x = math.floor(x / base)
end
print( r );

Resources