I'm looking for a algorithm that returns 5 steps between 2 given dynamic values, including both starting values, with exponential growth. The returned values should be nicely rounded and unique.
Example:
range 100 - 10000 should return something like this:
100, 500, 2500, 5000, 10000
This is what i came up with so far (credit goes mostly to the SO thread I once found but can't recover):
min = 100
max = 10000
a = Array.new
loops = 5
factor = 2.5
for i in 0..loops-1
x = (max - min) * ( (i.to_f / (loops.to_f - 1.0)) ** factor ) + min
case x
when min
a[i] = x.to_i
when max
a[i] = x.to_i
when (min + 1).to_f..500
a[i] = (x.to_f / 250).round(0) * 250
when 500..2000
a[i] = (x.to_f / 500).round(0) * 500
else
a[i] = (x.to_f / 2500).round(0) * 2500
end
end
The result is adjustable with the factor, I found 2.5 to be working best. This works quite well already in most cases. Before rounding I get these values:
[100.0, 409.37, 1850.08, 4922.67, 10000.0]
But it does not check for duplicates that can occur in the rounding process, which happens mostly if the range is smaller:
100 - 1000
Raw: [100.0, 128.12, 259.09, 538.42, 1000.0]
Rounded: [100, 250, 250, 500, 1000]
5000 - 10000
Raw: [5000.0, 5156.25,5883.88, 7435.69, 10000.0]
Rounded: [5000, 5000, 5000, 7500, 10000]
Now I'm a little torn between discarding the whole code and trying to come up with a smarter calculation method that already includes rounding or just checking for duplicates on a second run - but I didn't get a satisfying result from any of those two options.
Does someone have a clue on how to integrate a duplicate check in the rounding or make the rounding more dynamic?
Related
I am creating a rails application which is like a game. So it has points and levels. For example: to become level one the user has to get atleast 100 points and again for level two the user has to reach level 2 the user has to collect 200 points. The level difference changes after every 10 levels i.e., The difference between each level changes after 10 levels always. By that I mean the difference in points between level one and two is 100 and the difference in points in level 11 and 12 is 150 and so on. There is no upper bound for levels.
Now my question is let's say a user's total points is 3150 and just got updated to 3155. What's the optimal solution to find the current level and update it if needed?
I can get a solution using while loops and again looping inside it which will give a result in O(n^2). I need something better.
I think this code works but I'm not sure if this is the best way to go about it
def get_level(points)
diff = 100
sum = 0
level = -1
current_level = 0
while level.negative?
10.times do |i|
current_level += 1
sum += diff
if points > sum
next
elsif points <= sum
level = current_level
break
end
end
diff += 50
end
puts level
end
I wrote a get_points function (it should not be difficult). Then based on it get_level function in which it was necessary to solve the quadratic equation to find high value, and then calc low.
If you have any questions, let me know.
Check output here.
#!/usr/bin/env python3
import math
def get_points(level):
high = (level + 1) // 10
low = (level + 1) % 10
high_point = 250 * high * high + 750 * high # (3 + high) * high // 2 * 500
low_point = (100 + 50 * high) * low
return low_point + high_point
def get_level(points):
# quadratic equation
a = 250
b = 750
c = -points
d = b * b - 4 * a * c
x = (-b + math.sqrt(d)) / (2 * a)
high = int(x)
remainder = points - (250 * high * high + 750 * high)
low = remainder // (100 + 50 * high)
level = high * 10 + low
return level
def main():
for l in range(0, 40):
print(f'{l:3d} {get_points(l - 1):5d}..{get_points(l) - 1}')
for level, (l, r) in (
(1, (100, 199)),
(2, (200, 299)),
(9, (900, 999)),
(10, (1000, 1149)),
(11, (1150, 1299)),
(19, (2350, 2499)),
(20, (2500, 2699)),
):
for p in range(l, r + 1): # for in [l, r]
assert get_level(p) == level, f'{p} {l}'
if __name__ == '__main__':
main()
Why did you set the value of a=250 and b = 750? Can you explain that to me please?
Let's write out every 10 level and the difference between points:
lvl - pnt (+delta)
10 - 1000 (+1000 = +100 * 10)
20 - 2500 (+1500 = +150 * 10)
30 - 4500 (+2000 = +200 * 10)
40 - 7000 (+2500 = +250 * 10)
Divide by 500 (10 levels * 50 difference changes) and received an arithmetic progression starting at 2:
10 - 2 (+2)
20 - 5 (+3)
30 - 9 (+4)
40 - 14 (+5)
Use arithmetic progression get points formula for level = k * 10 equal to:
sum(x for x in 2..k+1) * 500 =
(2 + k + 1) * k / 2 * 500 =
(3 + k) * k * 250 =
250 * k * k + 750 * k
Now we have points and want to find the maximum high such that point >= 250 * high^2 + 750 * high, i. e. 250 * high^2 + 750 * high - points <= 0. Value a = 250 is positive and branches of the parabola are directed up. Now we find the solution of quadratic equation 250 * high^2 + 750 * high - points = 0 and discard the real part (is high = int(x) in python script).
Gradient descent update rule :
Using these values for this rule :
x = [10
20
30
40
50
60
70
80
90
100]
y = [4
7
8
4
5
6
7
5
3
4]
After two iterations using a learning rate of 0.07 outputs a value theta of
-73.396
-5150.803
After three iterations theta is :
1.9763e+04
1.3833e+06
So it appears theta gets larger after the second iteration which suggests the learning rate is too large.
So I set :
iterations = 300;
alpha = 0.000007;
theta is now :
0.0038504
0.0713561
Should these theta values allow me to draw a straight line the data, if so how ? I've just begun trying to understand gradient descent so please point out any errors in my logic.
source :
x = [10
20
30
40
50
60
70
80
90
100]
y = [4
7
8
4
5
6
7
5
3
4]
m = length(y)
x = [ones(m , 1) , x]
theta = zeros(2, 1);
iterations = 300;
alpha = 0.000007;
for iter = 1:iterations
theta = theta - ((1/m) * ((x * theta) - y)' * x)' * alpha;
theta
end
plot(x, y, 'o');
ylabel('Response Time')
xlabel('Time since 0')
Update :
So the product for each x value multiplied by theta plots a straight line :
plot(x(:,2), x*theta, '-')
Update 2 :
How does this relate to the linear regression model :
As the model also outputs a prediction value ?
Yes, you should be able to draw a straight line. In regression, gradient descent is an algorithm used to minimize the cost(error) function of your linear regression model. You use the gradient as a track to travel to the minimum of your cost function and the learning rate determines how quickly you travel down the path. Go too fast and you might pass the global minimum up. When you reached the desired minimum, plug those values of theta into your model to obtain your estimated model. In the one dimensional case, this is a straight line.
Check out this article, which gives a nice introduction to gradient descent.
I've a simple program with a for loop where i calculate some value that I print to the screen, but only the first value is printed, the rest is just NaN values. Is there any way to fix this? I suppose the numbers might have a lot of decimals thus the NaN issue.
Output from program:
0.18410
NaN
NaN
NaN
NaN
etc.
This is the code, maybe it helps:
for i=1:30
t = (100*i)*1.1*0.5;
b = factorial(round(100*i)) / (factorial(round((100*i)-t)) * factorial(round(t)));
% binomial distribution
d = b * 0.5^(t) * 0.5^(100*i-(t));
% cumulative
p = binocdf(1.1 * (100*i) * 0.5,100*i,0.5);
% >= AT LEAST
result = 1-p + d;
disp(result);
end
You could do the calculation of the fraction yourself.
Therefore you need to calculate $d$ directly. Then you can get all values of the numerators and the denominators and multiply them by hand and make sure that the result will not get too big. The following code is poorly in terms of speed and memory, but it may be a good start:
for i=1:30
t = (55*i);
b = factorial(100*i) / (factorial(100*i-t) * factorial(t));
% binomial distribution
d = b * 0.5^(t) * 0.5^(100*i-(t));
numerators = 1:(100*i);
denominators = [1:(100*i-t),1:55*i,ones(1,100*i)*2];
value = 1;
while length(numerators) > 0 || length(denominators) > 0
if length(numerators) == 0
value = value/denominators(1);
denominators(1) = [];
elseif length(denominators) == 0
value = value* numerators(1);
numerators(1) = [];
elseif value > 10000
value = value/denominators(1);
denominators(1) = [];
else
value = value* numerators(1);
numerators(1) = [];
end
end
% cumulative
p = binocdf(1.1 * (100*i) * 0.5,100*i,0.5);
% >= AT LEAST
result = 1-p + value;
disp(result);
end
output:
0.1841
0.0895
0.0470
0.0255
0.0142
0.0080
0.0045
...
Take a look at the documentation of factorial:
Note that the factorial function grows large quite quickly, and
even with double precision values overflow will occur if N > 171.
For such cases consider 'gammaln'.
On your second iteration you are already doing factorial (200) which returns Inf and then Inf/Inf returns NaN.
I am trying to implement logistic regression with gradient descent,
I get my Cost function j_theta for the number of iterations and fortunately my j_theta is decreasing when plotted j_theta against the number of iteration.
The data set I use is given below:
x=
1 20 30
1 40 60
1 70 30
1 50 50
1 50 40
1 60 40
1 30 40
1 40 50
1 10 20
1 30 40
1 70 70
y= 0
1
1
1
0
1
0
0
0
0
1
The code that I managed to write for logistic regression using Gradient descent is:
%1. The below code would load the data present in your desktop to the octave memory
x=load('stud_marks.dat');
%y=load('ex4y.dat');
y=x(:,3);
x=x(:,1:2);
%2. Now we want to add a column x0 with all the rows as value 1 into the matrix.
%First take the length
[m,n]=size(x);
x=[ones(m,1),x];
X=x;
% Now we limit the x1 and x2 we need to leave or skip the first column x0 because they should stay as 1.
mn = mean(x);
sd = std(x);
x(:,2) = (x(:,2) - mn(2))./ sd(2);
x(:,3) = (x(:,3) - mn(3))./ sd(3);
% We will not use vectorized technique, Because its hard to debug, We shall try using many for loops rather
max_iter=50;
theta = zeros(size(x(1,:)))';
j_theta=zeros(max_iter,1);
for num_iter=1:max_iter
% We calculate the cost Function
j_cost_each=0;
alpha=1;
theta
for i=1:m
z=0;
for j=1:n+1
% theta(j)
z=z+(theta(j)*x(i,j));
z
end
h= 1.0 ./(1.0 + exp(-z));
j_cost_each=j_cost_each + ( (-y(i) * log(h)) - ((1-y(i)) * log(1-h)) );
% j_cost_each
end
j_theta(num_iter)=(1/m) * j_cost_each;
for j=1:n+1
grad(j) = 0;
for i=1:m
z=(x(i,:)*theta);
z
h=1.0 ./ (1.0 + exp(-z));
h
grad(j) += (h-y(i)) * x(i,j);
end
grad(j)=grad(j)/m;
grad(j)
theta(j)=theta(j)- alpha * grad(j);
end
end
figure
plot(0:1999, j_theta(1:2000), 'b', 'LineWidth', 2)
hold off
figure
%3. In this step we will plot the graph for the given input data set just to see how is the distribution of the two class.
pos = find(y == 1); % This will take the postion or array number from y for all the class that has value 1
neg = find(y == 0); % Similarly this will take the position or array number from y for all class that has value 0
% Now we plot the graph column x1 Vs x2 for y=1 and y=0
plot(x(pos, 2), x(pos,3), '+');
hold on
plot(x(neg, 2), x(neg, 3), 'o');
xlabel('x1 marks in subject 1')
ylabel('y1 marks in subject 2')
legend('pass', 'Failed')
plot_x = [min(x(:,2))-2, max(x(:,2))+2]; % This min and max decides the length of the decision graph.
% Calculate the decision boundary line
plot_y = (-1./theta(3)).*(theta(2).*plot_x +theta(1));
plot(plot_x, plot_y)
hold off
%%%%%%% The only difference is In the last plot I used X where as now I use x whose attributes or features are featured scaled %%%%%%%%%%%
If you view the graph of x1 vs x2 the graph would look like,
After I run my code I create a decision boundary. The shape of the decision line seems to be okay but it is a bit displaced. The graph of the x1 vs x2 with decision boundary is given below:
![enter image description here][2]
Please suggest me where am I going wrong ....
Thanks:)
The New Graph::::
![enter image description here][1]
If you see the new graph the coordinated of x axis have changed ..... Thats because I use x(feature scalled) instead of X.
The problem lies in your cost function calculation and/or gradient calculation, your plotting function is fine. I ran your dataset on the algorithm I implemented for logistic regression but using the vectorized technique because in my opinion it is easier to debug.
The final values I got for theta were
theta =
[-76.4242,
0.8214,
0.7948]
I also used alpha = 0.3
I plotted the decision boundary and it looks fine, I would recommend using the vectorized form as it is easier to implement and to debug in my opinion.
I also think your implementation of gradient descent is not quite correct. 50 iterations is just not enough and the cost at the last iteration is not good enough. Maybe you should try to run it for more iterations with a stopping condition.
Also check this lecture for optimization techniques.
https://class.coursera.org/ml-006/lecture/37
in application.helper I was trying to do this.
However it always get 0. Why?
I want something like 37% 49% 98%
Always integer. no float
def evaluate(number_of_people)
percentage = ((number_of_people / 10000) * 100 ).truncate
"<div class='percentage'>Percentage is #{percentage}%</div>".html_safe
end
You're dividing by an integer, so the result of 3000 / 10000 will be 0.
Divide by 10000.0 instead to force decimal aritmatic.
So change this:
percentage = ((number_of_people / 10000) * 100 ).truncate
To this:
percentage = ((number_of_people / 10000.0) * 100 ).to_i
If your denominator (the 10000 value in this case) is a variable you can use to_f to cast it as a float before dividing.