Comparing Runtimes - Theoretical to Actual - comparison

Firstly, sorry for the long post, but I must be detailed in my explanation here. So here's what I have. I have code that measures the runtime of mergesort and radix sort algorithms for four different sizes of data.
Mergesort runtimes:
N = 10; runtime = 3499 nanoseconds
N = 100; runtime = 39600 nanoseconds
N = 1000; runtime = 470199 nanoseconds
N = 10000; runtime = 6227399 nanoseconds
Radixsort runtimes:
N = 10; runtime = 19200 nanoseconds
N = 100; runtime = 135099 nanoseconds
N = 1000; runtime = 1317799 nanoseconds
N = 10000; runtime = 14208600 nanoseconds
I have also measured the runtime of a single operation to be roughly 1000 nanoseconds on this machine. This was recommended by the professor as a means help convert theoretical runtimes to something that we can compare to the actual runtimes. For mergesort, I have O(n log(n)) as the runtime, and for radixsort I have O(nk), although I'm not entirely sure what the k represents. He suggested we do the following conversion, so I've done it for each one of the mergesorts. I don't know how to do this for radixsort, as I don't know how to factor in the 'k'. My understanding is that 'k' basically refers to the number of digits, but you can essentially stick with whichever will be larger (N or k), so since my N is always larger than k in the cases I'm working with, I'm just going to consider Radix as O(N). K is limited to six digits at the most, where N begins at 10 at the lowest value.
1000ns * theoreticalruntime
For example, 1000ns * 10 log2(10)
Mergesort:
N = 10; 33219.3 nanoseconds
N = 100; 664385.6 nanoseconds
N = 1000; 9.96578428 * 10^6 nanoseconds
N = 10000; 1.3287712379549449 * 10^8 nanoseconds
Radixsort: (1000ns per operation * N)
N = 10; 10000
N = 100; 100000
N = 1000; 1000000
N = 10000; 10000000
So here's where my issue comes in. One, I don't know how to do this calculation for the radixsort theoretical runtime. Two, I don't know exactly how to compare these values using a graph (the requirement).
In class, he was discussing using logs to "normalize" the data. The Y-axis would be N and the X-axis would be time, but he was talking about being able to use logs to change the N values from 10, 100, 1000, and 10000 to where they would show up as N = 1, 2, 3, 4. I have no idea how to do this, and I don't really know what I'd be plotting on the graph. If there's a better place I could be asking this, please point me in that direction. Time runs short.

Related

Uniswap Liquidity Calculation issue on Arbitrum Chain

According to the "Liqudity Math in uniswap v3", the liqudity of a position should be:
L = amt0 * (sqrt(upper) * sqrt(cprice)) / (sqrt(upper) - sqrt(cprice))
or
L = amt1 / (sqrt(cprice) - sqrt(lower))
I tried to calculate the liquidity of the below position on Arbitrum:
The nft token ID of the position is 69171, so I can get the liqudity by calling the contract(0xC36442b4a4522E871399CD717aBDD847Ab11FE88) on https://arbiscan.io
You can see it shows the liqudity is 50242219347523, and we can do more unit convertion:
Now I try to calcuate this number with the uniswap V3 math:
This is the output:
As we can see, the code output is very similar to the contract output, but if we look carefully, we will find the unit seems to be different. I know the unit of the contract ouput should be 'wei', but I don't know what the unit of the code results is. Can anybody help? Thanks.
I checked that position and pool. Best to query the current price from the pool's contract, for quick look the UI can be found at https://info.uniswap.org/#/arbitrum/pools/0x2f5e87c9312fa29aed5c179e456625d79015299c
The current price is shown as 11.9011 ETH per BTC, and there are 0.3122 BTC and 1.466 ETH in the pool. This gives:
price = 11.9011 * (1e18 / 1e8)
x = 0.3122 * 1e8
y = 1.466 * 1e18
The tick range of the position No. 69171 is 253300 to 259900. Use these values to calculate sp = sqrt(price) and the square roots of the price range boundaries, and from them, the liquidity:
sp = price ** 0.5
sa = 1.0001 ** (253300 // 2)
sb = 1.0001 ** (259900 // 2)
Lx = x * sp * sb / (sb - sp)
Ly = y / (sp - sa)
L = min(Lx, Ly)
The result Lx is 49905251975363.266 and Ly is 51071435112054.96. The Etherscan info shows liquidity L=50242219347523, in between these two values, which have a few % difference. A few % is an acceptable error given the imprecise input values used in this calculation; the UI shows the price and amount values in a rounded format.

Octave script falling into error when processing data with sampling frequency above 50kS/s

I'm working with an Octave script to process data files with high sample rates (up to 200kS/s collected over 3 minutes). The code runs into issues when processing any files with a sample rate above 50kS/s, regardless of size or number of samples but functions correctly otherwise.
The error I receive when attempting to run the code with files above 50kS/s is called from the hist function:
error: x(0): subscripts must be either integers 1 to (2^63)-1 or logicals
error:
I have narrowed the cause to the following section of code (note that FS is the detected sampling frequency):
FILTER_ORDER = 1;
FILTER_CUTOFF = 1 / (2 * pi * 300e-3);
[b_lp, a_lp] = butter(FILTER_ORDER, FILTER_CUTOFF / (FS / 2), 'low');
%
s = SCALING_FACTOR * filter(b_lp, a_lp, u_q) ;
P = s ;
%
tau = 20;
transient = tau * FS; % index after the transient
Pmax = max(P(transient:end));
%
s = s(transient:end);
%
NUMOF_CLASSES = 10000; % number of bins used for the histogram
[bin_cnt, cpf.magnitude] = hist(s, NUMOF_CLASSES); % sorts data into the number of bins specified
I can try to provide more information if required, I'm not very familiar with Octave.

Smart algorithm to create unique exponential sequence steps between two values

I'm looking for a algorithm that returns 5 steps between 2 given dynamic values, including both starting values, with exponential growth. The returned values should be nicely rounded and unique.
Example:
range 100 - 10000 should return something like this:
100, 500, 2500, 5000, 10000
This is what i came up with so far (credit goes mostly to the SO thread I once found but can't recover):
min = 100
max = 10000
a = Array.new
loops = 5
factor = 2.5
for i in 0..loops-1
x = (max - min) * ( (i.to_f / (loops.to_f - 1.0)) ** factor ) + min
case x
when min
a[i] = x.to_i
when max
a[i] = x.to_i
when (min + 1).to_f..500
a[i] = (x.to_f / 250).round(0) * 250
when 500..2000
a[i] = (x.to_f / 500).round(0) * 500
else
a[i] = (x.to_f / 2500).round(0) * 2500
end
end
The result is adjustable with the factor, I found 2.5 to be working best. This works quite well already in most cases. Before rounding I get these values:
[100.0, 409.37, 1850.08, 4922.67, 10000.0]
But it does not check for duplicates that can occur in the rounding process, which happens mostly if the range is smaller:
100 - 1000
Raw: [100.0, 128.12, 259.09, 538.42, 1000.0]
Rounded: [100, 250, 250, 500, 1000]
5000 - 10000
Raw: [5000.0, 5156.25,5883.88, 7435.69, 10000.0]
Rounded: [5000, 5000, 5000, 7500, 10000]
Now I'm a little torn between discarding the whole code and trying to come up with a smarter calculation method that already includes rounding or just checking for duplicates on a second run - but I didn't get a satisfying result from any of those two options.
Does someone have a clue on how to integrate a duplicate check in the rounding or make the rounding more dynamic?

Understanding code wrt Logistic Regression using gradient descent

I was following Siraj Raval's videos on logistic regression using gradient descent :
1) Link to longer video :
https://www.youtube.com/watch?v=XdM6ER7zTLk&t=2686s
2) Link to shorter video :
https://www.youtube.com/watch?v=xRJCOz3AfYY&list=PL2-dafEMk2A7mu0bSksCGMJEmeddU_H4D
In the videos he talks about using gradient descent to reduce the error for a set number of iterations so that the function converges(slope becomes zero).
He also illustrates the process via code. The following are the two main functions from the code :
def step_gradient(b_current, m_current, points, learningRate):
b_gradient = 0
m_gradient = 0
N = float(len(points))
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
b_gradient += -(2/N) * (y - ((m_current * x) + b_current))
m_gradient += -(2/N) * x * (y - ((m_current * x) + b_current))
new_b = b_current - (learningRate * b_gradient)
new_m = m_current - (learningRate * m_gradient)
return [new_b, new_m]
def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations):
b = starting_b
m = starting_m
for i in range(num_iterations):
b, m = step_gradient(b, m, array(points), learning_rate)
return [b, m]
#The above functions are called below:
learning_rate = 0.0001
initial_b = 0 # initial y-intercept guess
initial_m = 0 # initial slope guess
num_iterations = 1000
[b, m] = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations)
# code taken from Siraj Raval's github page
Why does the value of b & m continue to update for all the iterations? After a certain number of iterations, the function will converge, when we find the values of b & m that give slope = 0.
So why do we continue iteration after that point and continue updating b & m ?
This way, aren't we losing the 'correct' b & m values? How is learning rate helping the convergence process if we continue to update values after converging? Thus, why is there no check for convergence, and so how is this actually working?
In practice, most likely you will not reach to slope 0 exactly. Thinking of your loss function as a bowl. If your learning rate is too high, it is possible to overshoot over the lowest point of the bowl. On the contrary, if the learning rate is too low, your learning will become too slow and won't reach the lowest point of the bowl before all iterations are done.
That's why in machine learning, the learning rate is an important hyperparameter to tune.
Actually, once we reach a slope 0; b_gradient and m_gradient will become 0;
thus, for :
new_b = b_current - (learningRate * b_gradient)
new_m = m_current - (learningRate * m_gradient)
new_b and new_m will remain the old correct values; as nothing will be subtracted from them.

F# Floating point ranges are experimental and may be deprecated

I'm trying to make a little function to interpolate between two values with a given increment.
[ 1.0 .. 0.5 .. 20.0 ]
The compiler tells me that this is deprecated, and suggests using ints then casting to float. But this seems a bit long-winded if I have a fractional increment - do I have to divide my start and end values by my increment, then multiple again afterwards? (yeuch!).
I saw something somewhere once about using sequence comprehensions to do this, but I can't remember how.
Help, please.
TL;DR: F# PowerPack's BigRational type is the way to go.
What's Wrong with Floating-point Loops
As many have pointed out, float values are not suitable for looping:
They do have Round Off Error, just like with 1/3 in decimal, we inevitably lose all digits starting at a certain exponent;
They do experience Catastrophic Cancellation (when subtracting two almost equal numbers, the result is rounded to zero);
They always have non-zero Machine epsilon, so the error is increased with every math operation (unless we are adding different numbers many times so that errors mutually cancel out -- but this is not the case for the loops);
They do have different accuracy across the range: the number of unique values in a range [0.0000001 .. 0.0000002] is equivalent to the number of unique values in [1000000 .. 2000000];
Solution
What can instantly solve the above problems, is switching back to integer logic.
With F# PowerPack, you may use BigRational type:
open Microsoft.FSharp.Math
// [1 .. 1/3 .. 20]
[1N .. 1N/3N .. 20N]
|> List.map float
|> List.iter (printf "%f; ")
Note, I took my liberty to set the step to 1/3 because 0.5 from your question actually has an exact binary representation 0.1b and is represented as +1.00000000000000000000000 * 2-1; hence it does not produce any cumulative summation error.
Outputs:
1.000000; 1.333333; 1.666667; 2.000000; 2.333333; 2.666667; 3.000000; (skipped) 18.000000; 18.333333; 18.666667; 19.000000; 19.333333; 19.666667; 20.000000;
// [0.2 .. 0.1 .. 3]
[1N/5N .. 1N/10N .. 3N]
|> List.map float
|> List.iter (printf "%f; ")
Outputs:
0.200000; 0.300000; 0.400000; 0.500000; (skipped) 2.800000; 2.900000; 3.000000;
Conclusion
BigRational uses integer computations, which are not slower than for floating-points;
The round-off occurs only once for each value (upon conversion to a float, but not within the loop);
BigRational acts as if the machine epsilon were zero;
There is an obvious limitation: you can't use irrational numbers like pi or sqrt(2) as they have no exact representation as a fraction. It does not seem to be a very big problem because usually, we are not looping over both rational and irrational numbers, e.g. [1 .. pi/2 .. 42]. If we do (like for geometry computations), there's usually a way to reduce the irrational part, e.g. switching from radians to degrees.
Further reading:
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Numeric types in PowerPack
Interestingly, float ranges don't appear to be deprecated anymore. And I remember seeing a question recently (sorry, couldn't track it down) talking about the inherent issues which manifest with float ranges, e.g.
> let xl = [0.2 .. 0.1 .. 3.0];;
val xl : float list =
[0.2; 0.3; 0.4; 0.5; 0.6; 0.7; 0.8; 0.9; 1.0; 1.1; 1.2; 1.3; 1.4; 1.5; 1.6;
1.7; 1.8; 1.9; 2.0; 2.1; 2.2; 2.3; 2.4; 2.5; 2.6; 2.7; 2.8; 2.9]
I just wanted to point out that you can use ranges on decimal types with a lot less of these kind of rounding issues, e.g.
> [0.2m .. 0.1m .. 3.0m];;
val it : decimal list =
[0.2M; 0.3M; 0.4M; 0.5M; 0.6M; 0.7M; 0.8M; 0.9M; 1.0M; 1.1M; 1.2M; 1.3M;
1.4M; 1.5M; 1.6M; 1.7M; 1.8M; 1.9M; 2.0M; 2.1M; 2.2M; 2.3M; 2.4M; 2.5M;
2.6M; 2.7M; 2.8M; 2.9M; 3.0M]
And if you really do need floats in the end, then you can do something like
> {0.2m .. 0.1m .. 3.0m} |> Seq.map float |> Seq.toList;;
val it : float list =
[0.2; 0.3; 0.4; 0.5; 0.6; 0.7; 0.8; 0.9; 1.0; 1.1; 1.2; 1.3; 1.4; 1.5; 1.6;
1.7; 1.8; 1.9; 2.0; 2.1; 2.2; 2.3; 2.4; 2.5; 2.6; 2.7; 2.8; 2.9; 3.0]
As Jon and others pointed out, floating point range expressions are not numerically robust. For example [0.0 .. 0.1 .. 0.3] equals [0.0 .. 0.1 .. 0.2]. Using Decimal or Int Types in the range expression is probably better.
For floats I use this function, it first increases the total range 3 times by the smallest float step. I am not sure if this algorithm is very robust now. But it is good enough for me to insure that the stop value is included in the Seq:
let floatrange start step stop =
if step = 0.0 then failwith "stepsize cannot be zero"
let range = stop - start
|> BitConverter.DoubleToInt64Bits
|> (+) 3L
|> BitConverter.Int64BitsToDouble
let steps = range/step
if steps < 0.0 then failwith "stop value cannot be reached"
let rec frange (start, i, steps) =
seq { if i <= steps then
yield start + i*step
yield! frange (start, (i + 1.0), steps) }
frange (start, 0.0, steps)
Try the following sequence expression
seq { 2 .. 40 } |> Seq.map (fun x -> (float x) / 2.0)
You can also write a relatively simple function to generate the range:
let rec frange(from:float, by:float, tof:float) =
seq { if (from < tof) then
yield from
yield! frange(from + by, tof) }
Using this you can just write:
frange(1.0, 0.5, 20.0)
Updated version of Tomas Petricek's answer, which compiles, and works for decreasing ranges (and works with units of measure):
(but it doesn't look as pretty)
let rec frange(from:float<'a>, by:float<'a>, tof:float<'a>) =
// (extra ' here for formatting)
seq {
yield from
if (float by > 0.) then
if (from + by <= tof) then yield! frange(from + by, by, tof)
else
if (from + by >= tof) then yield! frange(from + by, by, tof)
}
#r "FSharp.Powerpack"
open Math.SI
frange(1.0<m>, -0.5<m>, -2.1<m>)
UPDATE I don't know if this is new, or if it was always possible, but I just discovered (here), that this - simpler - syntax is also possible:
let dl = 9.5 / 11.
let min = 21.5 + dl
let max = 40.5 - dl
let a = [ for z in min .. dl .. max -> z ]
let b = a.Length
(Watch out, there's a gotcha in this particular example :)

Resources