Exit condition 41 on SNOPT - drake

I'm trying to run an optimization with SNOPT.
Right now as I run it I consistently get exit condition 41.
I've added the following parameters to the solver:
prog.SetSolverOption(solver.solver_id(),"Function Precision","1e-6")
prog.SetSolverOption(solver.solver_id(),"Major Optimality Tolerance","1e-3")
prog.SetSolverOption(solver.solver_id(),"Superbasics limit","600")
#print("Trying to Solve")
result = solver.Solve(prog)
print(result.is_success())
But I still contently get the same exit condition.
The error seems to be from the interpolation function I'm using. (When I remove it I no longer get the error).
t, c, k = interpolate.splrep(sref, kapparef, s=0, k=3)
kappafnc = interpolate.BSpline(t, c, k, extrapolate=False)
Here is the function I think I determined was causing the issue:
def car_continous_dynamics(self, state, force, steering):
beta = steering
s = state[0]
#print(s)
n = state[1]
alpha = state[2]
v = state[3]
#State = s, n, alpha , v
#s= distance along the track, n = perpendicular distance along the track
#alpha = velocity angle relative to the track
#v= magnitude of the velocity of the car
s_val = 0
if s.value() >0:
s_val = s.value()
Cm1 = 0.28
Cm2 = 0.05
Cr0 = 0.011
Cr2 = 0.006
m = 0.043
phi_dot = v*beta*15.5
Fxd = (Cm1 - Cm2 * v) * force - Cr2 * v * v - Cr0 *tanh(5 * v)
s_dot = v*cos(alpha+.5*beta)/(1)##-n*self.kappafunc(s_val))
n_dot= v*sin(alpha+.5*beta)
alpha_dot = phi_dot #-s_dot*self.kappafunc(s_val)
v_dot=Fxd/m*cos(beta*.5)
# concatenate velocity and acceleration
#print(s_dot)
#print(n_dot)
#print(alpha_dot)
#print(v_dot)
state_dot = np.array((s_dot,n_dot,alpha_dot,v_dot))
#print("State dot")
#print(state_dot.dtype.name)
#print(state_dot)
#print(state_dot)
return state_dot
``

Debugging INFO 41 in SNOPT is generally challenging, it is often caused by some bad gradient (but could also due to the problem being really hard to optimize).
You should check if your gradient is well-behaved. I see you have
s_val = 0
if s.value() >0:
s_val = s.value()
There are two potential problems
Your s carries the gradient (you could check s.derivatives(), it has non-zero gradient), but when you compute s_val, this is a float object with no gradient. Hence you lose the gradient information.
s_val is non-differentiable at 0.
I would use s instead of s_val in your function (without the branch if s.value() > 0). And also add a constraint s>=0 by
prog.AddBoundingBoxConstraint(0, np.inf, s)
With this bounding box constraint, SNOPT guarantees that in each iteration, the value of is is always non-negative.

Related

Modifying the loss in ppo in stable-baselines3

I'm trying to implement an addition to the loss function of the ppo algorithm in stable-baselines3. For this I collected additional observations for the states s(t-10) and s(t+1) which I can access in the train-function of the PPO class in ppo.py as part of the rollout_buffer.
I'm using a 3-layer-mlp as my network architecture and need the outputs of the second layer for the triplet (s(t-α), s(t), s(t+1)) to use them to calculate L = max(d(s(t+1) , s(t)) − d(s(t+1) , s(t−α)) + γ, 0), where d is the L2-distance.
Finally I want to add this term to the old loss, so loss = loss + 0.3 * L
This is my implementation starting with the original loss in line 242:
loss = policy_loss + self.ent_coef * entropy_loss + self.vf_coef * value_loss
###############################
net1 = nn.Sequential(*list(self.policy.mlp_extractor.policy_net.children())[:-1])
L_losses = []
a = 0
obs = rollout_data.observations
obs_alpha = rollout_data.observations_alpha
obs_plusone = rollout_data.observations_plusone
inds = rollout_data.inds
for i in inds:
if i > alpha: # only use observations for which L can be calculated
fs_t = net1(obs[a])
fs_talpha = net1(obs_alpha[a])
fs_tone = net1(obs_plusone[a])
L = max(
th.norm(th.subtract(fs_tone, fs_t)) - th.norm(th.subtract(fs_tone, fs_talpha)) + 1.0, 0.0)
L_losses.append(L)
else:
L_losses.append(0)
a += 1
L_loss = th.mean(th.FloatTensor(L_losses))
loss += 0.3 * L_loss
So with net1 I tried to get a clone of the original network with the outputs from the second layer. I am unsure if this is the right way to do this.
I do have some questions about my approach as the resulting performance is slightly worse compared to without the added term although it should be slightly better:
Is my way of getting the outputs of the second layer of the mlp network working?
When loss.backward() is called can the gradient be calculated correctly (with the new term included)?

Which sigmoid function to use for increasing contrast of an image?

Is there a standard sigmoidal function used to increase the contrast in a gray level bitmap?
Currently I am using the following. This would be applied to gray levels represented at values between 0 and 1 inclusive.
static double ContrastCurve(double val, double k = 1)
{
Func<double,double> logistic_func = (double x) => 1.0 / (1.0 + Math.Exp(-k * (x - 0.5)));
var low = logistic_func(0);
var high = logistic_func(1);
var range = high - low;
var value = logistic_func(val);
return (value - low) / range;
}
This is the logistic function applied to a value between 0 and 1 with the output normalized so that the output is also in [0...1]. This function works but it is completely arbitrary, something I just made up, so the k param has no official name or meaning in image processing literature and so forth.
If there is a function that is standard I would prefer that but haven't found anything that seems definitive. Code such as this link seems as ad hoc to me.
As Mark Setchell's comment notes, ImageMagick uses the following function citing "Fundamentals of Image Processing", Hany Farid:
g(u) = 1 / [1 + exp(-α*u + β)]
scaled such that for domain [0..1] its range is [0..1].
This is essentially a two parameter version of the function defined in the code in the question above i.e. the code in the question implements the same function but makes the substitution α = k and β = -k/2 which yields a one parameter function f where f(0.5) = 0.5 when scaled such that f(0) = 0 and f(1) = 1.

shape of input to calculate information gain

I want to calculate the information gain on 20_newsgroup data set.
I am using the code here(also I put a copy of the code down of the question).
As you see the input to the algorithm is X,y
My confusion is that, X is going to be a matrix with documents in rows and features as column. (according to 20_newsgroup it is 11314,1000
in case i only considered 1000 features).
but according to the concept of information gain, it should calculate information gain for each feature.
(So I was expecting to see the code in a way loop through each feature, so the input to the function be a matrix where rows are features and columns are class)
But X is not feature here but X stands for documents, and I can not see the part in the code that take care of this part! ( I mean considering each document, and then going through each feature of that document; like looping through rows but at the same time looping through columns as the features are stored in columns).
I have read this and this and many similar questions but they are not clear in terms of input matrix shape.
this is the code for reading 20_newsgroup:
newsgroup_train = fetch_20newsgroups(subset='train')
X,y = newsgroup_train.data,newsgroup_train.target
cv = CountVectorizer(max_df=0.99,min_df=0.001, max_features=1000,stop_words='english',lowercase=True,analyzer='word')
X_vec = cv.fit_transform(X)
(X_vec.shape) is (11314,1000) which is not features in the 20_newsgroup data set. I am thinking am I calculating Information gain in a incorrect way?
This is the code for Information gain:
def information_gain(X, y):
def _calIg():
entropy_x_set = 0
entropy_x_not_set = 0
for c in classCnt:
probs = classCnt[c] / float(featureTot)
entropy_x_set = entropy_x_set - probs * np.log(probs)
probs = (classTotCnt[c] - classCnt[c]) / float(tot - featureTot)
entropy_x_not_set = entropy_x_not_set - probs * np.log(probs)
for c in classTotCnt:
if c not in classCnt:
probs = classTotCnt[c] / float(tot - featureTot)
entropy_x_not_set = entropy_x_not_set - probs * np.log(probs)
return entropy_before - ((featureTot / float(tot)) * entropy_x_set
+ ((tot - featureTot) / float(tot)) * entropy_x_not_set)
tot = X.shape[0]
classTotCnt = {}
entropy_before = 0
for i in y:
if i not in classTotCnt:
classTotCnt[i] = 1
else:
classTotCnt[i] = classTotCnt[i] + 1
for c in classTotCnt:
probs = classTotCnt[c] / float(tot)
entropy_before = entropy_before - probs * np.log(probs)
nz = X.T.nonzero()
pre = 0
classCnt = {}
featureTot = 0
information_gain = []
for i in range(0, len(nz[0])):
if (i != 0 and nz[0][i] != pre):
for notappear in range(pre+1, nz[0][i]):
information_gain.append(0)
ig = _calIg()
information_gain.append(ig)
pre = nz[0][i]
classCnt = {}
featureTot = 0
featureTot = featureTot + 1
yclass = y[nz[1][i]]
if yclass not in classCnt:
classCnt[yclass] = 1
else:
classCnt[yclass] = classCnt[yclass] + 1
ig = _calIg()
information_gain.append(ig)
return np.asarray(information_gain)
Well, after going through the code in detail, I learned more about X.T.nonzero().
Actually it is correct that information gain needs to loop through features.
Also it is correct that the matrix scikit-learn give us here is based on doc-features.
But:
in code it uses X.T.nonzero() which technically transform all the nonzero values into array. and then in the next row loop through the length of that array range(0, len(X.T.nonzero()[0]).
Overall, this part X.T.nonzero()[0] is returning all the none zero features to us :)

Understanding code wrt Logistic Regression using gradient descent

I was following Siraj Raval's videos on logistic regression using gradient descent :
1) Link to longer video :
https://www.youtube.com/watch?v=XdM6ER7zTLk&t=2686s
2) Link to shorter video :
https://www.youtube.com/watch?v=xRJCOz3AfYY&list=PL2-dafEMk2A7mu0bSksCGMJEmeddU_H4D
In the videos he talks about using gradient descent to reduce the error for a set number of iterations so that the function converges(slope becomes zero).
He also illustrates the process via code. The following are the two main functions from the code :
def step_gradient(b_current, m_current, points, learningRate):
b_gradient = 0
m_gradient = 0
N = float(len(points))
for i in range(0, len(points)):
x = points[i, 0]
y = points[i, 1]
b_gradient += -(2/N) * (y - ((m_current * x) + b_current))
m_gradient += -(2/N) * x * (y - ((m_current * x) + b_current))
new_b = b_current - (learningRate * b_gradient)
new_m = m_current - (learningRate * m_gradient)
return [new_b, new_m]
def gradient_descent_runner(points, starting_b, starting_m, learning_rate, num_iterations):
b = starting_b
m = starting_m
for i in range(num_iterations):
b, m = step_gradient(b, m, array(points), learning_rate)
return [b, m]
#The above functions are called below:
learning_rate = 0.0001
initial_b = 0 # initial y-intercept guess
initial_m = 0 # initial slope guess
num_iterations = 1000
[b, m] = gradient_descent_runner(points, initial_b, initial_m, learning_rate, num_iterations)
# code taken from Siraj Raval's github page
Why does the value of b & m continue to update for all the iterations? After a certain number of iterations, the function will converge, when we find the values of b & m that give slope = 0.
So why do we continue iteration after that point and continue updating b & m ?
This way, aren't we losing the 'correct' b & m values? How is learning rate helping the convergence process if we continue to update values after converging? Thus, why is there no check for convergence, and so how is this actually working?
In practice, most likely you will not reach to slope 0 exactly. Thinking of your loss function as a bowl. If your learning rate is too high, it is possible to overshoot over the lowest point of the bowl. On the contrary, if the learning rate is too low, your learning will become too slow and won't reach the lowest point of the bowl before all iterations are done.
That's why in machine learning, the learning rate is an important hyperparameter to tune.
Actually, once we reach a slope 0; b_gradient and m_gradient will become 0;
thus, for :
new_b = b_current - (learningRate * b_gradient)
new_m = m_current - (learningRate * m_gradient)
new_b and new_m will remain the old correct values; as nothing will be subtracted from them.

Attempt to get numbers instead of NaN values when printing

I've a simple program with a for loop where i calculate some value that I print to the screen, but only the first value is printed, the rest is just NaN values. Is there any way to fix this? I suppose the numbers might have a lot of decimals thus the NaN issue.
Output from program:
0.18410
NaN
NaN
NaN
NaN
etc.
This is the code, maybe it helps:
for i=1:30
t = (100*i)*1.1*0.5;
b = factorial(round(100*i)) / (factorial(round((100*i)-t)) * factorial(round(t)));
% binomial distribution
d = b * 0.5^(t) * 0.5^(100*i-(t));
% cumulative
p = binocdf(1.1 * (100*i) * 0.5,100*i,0.5);
% >= AT LEAST
result = 1-p + d;
disp(result);
end
You could do the calculation of the fraction yourself.
Therefore you need to calculate $d$ directly. Then you can get all values of the numerators and the denominators and multiply them by hand and make sure that the result will not get too big. The following code is poorly in terms of speed and memory, but it may be a good start:
for i=1:30
t = (55*i);
b = factorial(100*i) / (factorial(100*i-t) * factorial(t));
% binomial distribution
d = b * 0.5^(t) * 0.5^(100*i-(t));
numerators = 1:(100*i);
denominators = [1:(100*i-t),1:55*i,ones(1,100*i)*2];
value = 1;
while length(numerators) > 0 || length(denominators) > 0
if length(numerators) == 0
value = value/denominators(1);
denominators(1) = [];
elseif length(denominators) == 0
value = value* numerators(1);
numerators(1) = [];
elseif value > 10000
value = value/denominators(1);
denominators(1) = [];
else
value = value* numerators(1);
numerators(1) = [];
end
end
% cumulative
p = binocdf(1.1 * (100*i) * 0.5,100*i,0.5);
% >= AT LEAST
result = 1-p + value;
disp(result);
end
output:
0.1841
0.0895
0.0470
0.0255
0.0142
0.0080
0.0045
...
Take a look at the documentation of factorial:
Note that the factorial function grows large quite quickly, and
even with double precision values overflow will occur if N > 171.
For such cases consider 'gammaln'.
On your second iteration you are already doing factorial (200) which returns Inf and then Inf/Inf returns NaN.

Resources