I am trying to implement the Regularized Logistic Regression Algorithm, using the fminunc() function in Octave for minimising the cost function. As generally advised, I would like to plot the cost function as a function of iterations of the fminunc() function. The function call looks as follows -
[theta, J, exit_flag] = ...
fminunc(#(t)(costFunctionReg(t, X, y, lambda)), initial_theta, options);
with
options = optimset('GradObj', 'on', 'MaxIter', 400, 'OutputFcn',#showJ_history);
[showJ-history is the intended output function; I hope I have set the options parameter correctly].
But, I can't find good sources on the internet highlighting how to write this output function, specifically, what parameters are passed to it by the fminunc(), what it returns (if anything in particular required by the fminunc()).
Could someone please mention some helpful links or assist me in writing the output function.
I think you can refer to the source code. Consider also this example:
1;
function f = __rosenb (x)
# http://en.wikipedia.org/wiki/Rosenbrock_function
n = length (x);
f = sumsq (1 - x(1:n-1)) + 100 * sumsq (x(2:n) - x(1:n-1).^2);
endfunction
function bstop = showJ_history(x, optv, state)
plot(optv.iter, optv.fval, 'x')
# setting bstop to true stops optimization
bstop = false;
endfunction
opt = optimset('OutputFcn', #showJ_history);
figure()
xlabel("iteration")
ylabel("cost function")
hold on
[x, fval, info, out] = fminunc (#__rosenb, [5, -5], opt);
Related
I am trying to use reinforcement learning in julia to teach a car that is constantly being accelerated backwards (but with a positive initial velocity) to apply brakes so that it gets as close to a target distance as possible before moving backwards.
To do this, I am making use of POMDPs.jl and crux.jl which has many solvers (I'm using DQN). I will list what I believe to be the relevant parts of the script first, and then more of it towards the end.
To define the MDP, I set the initial position, velocity, and force from the brakes as a uniform distribution over some values.
#with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
...
end
My state holds the values of (position, velocity, brake force), and the initial state is given as:
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
Then, I set up my DQN solver using crux.jl and called a function to solve for the policy
solver_dqn = DQN(π=Q_network(), S=s, N=30000)
policy_dqn = solve(solver_dqn, mdp)
calling solve() gives me the error MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing). I am quite sure that this comes from the initial state sampling, but I am not sure why or how to fix it. I have only been learning RL from various books and online lectures for a very short time, so any help regarding the error or my the model I set up (or anything else I'm oblivious to) would be appreciated.
More comprehensive code:
Packages:
using POMDPs
using POMDPModelTools
using POMDPPolicies
using POMDPSimulators
using Parameters
using Random
using Crux
using Flux
using Distributions
Rest of it:
#with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
m::Float64 = 1.
tension::Float64 = 3.
dmax::Float64 = 2.
target::Float64 = 80.
dt::Float64 = .05
γ::Float32 = 1.
actions::Vector{Float64} = [-.1, 0., .1]
end
function POMDPs.gen(env::SliderMDP, s, a, rng::AbstractRNG = Random.GLOBAL_RNG)
x, ẋ, d = s
if x >= env.target
a = .1
end
if d+a >= env.dmax || d+a <= 0
a = 0.
end
force = (d + env.tension) * -1
ẍ = force/env.m
# Simulation
x_ = x + env.dt * ẋ
ẋ_ = ẋ + env.dt * ẍ
d_ = d + a
sp = vcat(x_, ẋ_, d_)
reward = abs(env.target - x) * -1
return (sp=sp, r=reward)
end
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
POMDPs.isterminal(mdp::SliderMDP, s) = s[2] <= 0
POMDPs.discount(mdp::SliderMDP) = mdp.γ
mdp = SliderMDP();
s = state_space(mdp); # Using Crux.jl
function Q_network()
layer1 = Dense(3, 64, relu)
layer2 = Dense(64, 64, relu)
layer3 = Dense(64, length(3))
return DiscreteNetwork(Chain(layer1, layer2, layer3), [-.1, 0, .1])
end
solver_dqn = DQN(π=Q_network(), S=s, N=30000) # Using Crux.jl
policy_dqn = solve(solver_dqn, mdp) # Error comes here
Stacktrace:
policy_dqn
MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing)
Closest candidates are:
logpdf(::Distributions.DiscreteNonParametric, !Matched::Real) at C:\Users\name\.julia\packages\Distributions\Xrm9e\src\univariate\discrete\discretenonparametric.jl:106
logpdf(::Distributions.UnivariateDistribution{S} where S<:Distributions.ValueSupport, !Matched::AbstractArray) at deprecated.jl:70
logpdf(!Matched::POMDPPolicies.PlaybackPolicy, ::Any) at C:\Users\name\.julia\packages\POMDPPolicies\wMOK3\src\playback.jl:34
...
logpdf(::Crux.ObjectCategorical, ::Float32)#utils.jl:16
logpdf(::Crux.DistributionPolicy, ::Vector{Float64}, ::Float32)#policies.jl:305
var"#exploration#133"(::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::typeof(Crux.exploration), ::Crux.DistributionPolicy, ::Vector{Float64})#policies.jl:302
exploration#policies.jl:297[inlined]
action(::Crux.DistributionPolicy, ::Vector{Float64})#policies.jl:294
var"#exploration#136"(::Crux.DiscreteNetwork, ::Int64, ::typeof(Crux.exploration), ::Crux.MixedPolicy, ::Vector{Float64})#policies.jl:326
var"#step!#173"(::Bool, ::Int64, ::typeof(Crux.step!), ::Dict{Symbol, Array}, ::Int64, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})#sampler.jl:55
var"#steps!#174"(::Int64, ::Bool, ::Int64, ::Bool, ::Bool, ::Bool, ::typeof(Crux.steps!), ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})#sampler.jl:108
var"#fillto!#177"(::Int64, ::Bool, ::typeof(Crux.fillto!), ::Crux.ExperienceBuffer{Array}, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace}, ::Int64)#sampler.jl:156
solve(::Crux.OffPolicySolver, ::Main.workspace#2.SliderMDP)#off_policy.jl:86
top-level scope#Local: 1[inlined]
Short answer:
Change your output vector to Float32 i.e. Float32[-.1, 0, .1].
Long answer:
Crux creates a Distribution over your network's output values, and at some point (policies.jl:298) samples a random value from it. It then converts this value to a Float32. Later (utils.jl:15) it does a findfirst to find the index of this value in the original output array (stored as objs within the distribution), but because the original array is still Float64, this fails and returns a nothing. Hence the error.
I believe this (converting the sampled value but not the objs array and/or not using approximate equality check i.e. findfirst(isapprox(x), d.objs)) to be a bug in the package, and would encourage you to raise this as an issue on Github.
This is a follow up question on the same system from the following post (for additional context). I describe a nonlinear system as a LeafSystem_[T] (using templates) with two input ports and one output port. Then, I essentially would like to perform direct transcription using MathematicalProgram with an additional cost function that is dependent on the linearized dynamics at each time step (and hence linearized around the decision variables). I use two input ports as it seemed the most straightforward way for obtaining the linearized dynamics of the form from this paper on DIRTREL (if I can take the Jacobian with respect to input ports)
δxi+1 ≈ Aiδx + Biδu + Giw
where i is the timestep, x is the state, the first input port can encapsulate u, and the second input port can model w which may be disturbance, uncertainty, etc.
My main question is what would be the most suitable set of methods to obtain the linearized dynamics around the decision variables at each time step using automatic differentation? I was recommended trying automatic differentiation after attempting a symbolic approach in the previous post, but am not familiar with the setup for doing so. I have experimented with
using primitives.Linearize() (calling it twice, once for each input port) which feels rather clunky and I am not sure whether it is possible to pass in decision variables into context
perhaps converting my system into a multibody and making use of multibody.tree.JacobianWrtVariable()
or formatting my system dynamics so that I can pass them in as the function argument for forwarddiff.jacobian
but have met limited success.
The easiest way to get Ai, Bi is to instantiate your system with AutoDiffXd, namely LeafSystem<AutoDiffXd>. The following code will give you Ai, Bi
MyLeafSystem<AutoDiffXd> my_system;
Eigen::VectorXd x_val = ...
Eigen::VectorXd u_val = ...
Eigen::VectorXd w_val = ...
// xuw_val concantenate x_val, u_val and w_val
Eigen::VectorXd xuw_val(x_val.rows() + u_val.rows() + w_val.rows());
xuw_val.head(x_val.rows()) = x_val;
xuw_val.segment(x_val.rows(), u_val.rows()) = u_val;
xuw_val.segment(w_val.rows()) = w_val;
// xuw_autodiff stores xuw_val in its value(), and an identity matrix in its gradient()
AutoDiffVecXd xuw_autodiff = math::initializeAutoDiff(xuw_val);
AutoDiffVecXd x_autodiff = xuw_autodiff.head(x_val.rows());
AutoDiffVecXd u_autodiff = xuw_autodiff.segment(x_val.rows(), u_val.rows());
AutoDiffVecXd w_autodiff = xuw_autodiff.tail(u_val.rows());
// I suppose you have a function x[n+1] = dynamics(system, x[n], u[n], w[n]). This dynamics function could be a wrapper of CalcUnrestrictedUpdate function.
AutoDiffVecXd x_next_autodiff = dynamics(my_system, x_autodiff, u_autodiff, w_autodiff);
Eigen::MatrixXd x_next_gradient = math::autoDiffToGradientMatrix(x_next_autodiff);
Eigen::MatrixXd Ai = x_next_gradient.block(0, 0, x_val.rows(), x_val.rows());
Eigen::MatrixXd Bi = x_next_gradient.block(0, x_val.rows(), x_val.rows(), u_val.rows());
Eigen::MatrixXd Gi = x_next_gradient.block(0, x_val.rows() + u_val.rows(), x_val.rows(), w_val.rows());
So you get the value of Ai, Bi, Gi in the end.
If you need to write a cost function, you will need to create a subclass of solvers::Cost. Inside the Eval function of this derived class, you will implement your code to first compute Ai, Bi, Gi, and then integrate the Riccati equation.
But I think since your cost function depends on Ai, Bi, Gi, the gradient of your cost function will depend on the gradient of Ai, Bi, Gi. Currently we don't provide the function to compute the second order gradient of the dynamics.
How complicated is your dynamical system? Is it possible to write down the dynamics by hand? If so, there are some shortcuts we can do to generate the second order gradient of your dynamics.
#sherm or other Drake dynamics folks, it would be great to get your opinion on how to get the second order gradient (assuming Phil could confirm he does need the second order gradient.)
Sorry for my belated reply.
Since your dynamics can be written by hand, then I would suggest to create a templated function to compute Ai, Bi, Gi as
template <typename T>
void ComputeLinearizedDynamics(
const LeafSystem<T>& my_system,
const Eigen::Ref<const drake::VectorX<T>>& x,
const Eigen::Ref<const drake::VectorX<T>>& u,
drake::MatrixX<T>* Ai,
drake::MatrixX<T>* Bi,
drake::MatrixX<T>* Gi) const;
You will need to write down the matrix Ai, Bi, Gi by hand within this function. Then when you instantiate your LeafSystem with T=AutoDiffXd, this function will compute Ai, Bi, Gi with its gradient, given the state x, input u and disturbance w.
Then in the cost function, you could consider to create a sub-class of Cost class as
class MyCost {
public:
MyCost(const LeafSystem<AutoDiffXd>& my_system) : my_system_{&my_system} {}
protected:
void DoEval(const Eigen::Ref<const Eigen::VectorXd>& x_input, Eigen::VectorXd* y) const {
// The computation here is inefficient, as we need to cast
// x_input to Eigen vector of AutoDiffXd, and then call
// DoEval with AutoDiffXd version, and then convert the
// result back to double. But it is easy to implement.
const AutoDiffVecXd x_autodiff = math::initializeAutoDiff(x_input);
AutoDiffVecXd y_autodiff;
this->DoEval(x_autodiff, &y_autodiff);
*y = math::autodiffToValueMatrix(y_autodiff);
}
void DoEval(const Eigen::Ref<const drake::AutoDiffVecXd>& x_input, drake::AutoDiffVecXd* y) const {
// x_input here contains all the state and control sequence The authors need to first partition x_input into x, u
drake::VectorX<T> x_all = x_input.head(num_x_ * nT_);
drake::VectorX<T> u_all = x_input.tail(num_u_ * nT_);
y->resize(1);
y(0) = 0;
// I assume S_final_ is stored in this class.
Eigen::MatrixXd S = S_final_;
for (int i = nT-1; i >= 0; --i) {
drake::MatrixX<AutoDiffXd> Ai, Bi, Gi;
ComputeLinearizedDynamics(
*my_system_,
x_all.segment(num_x_ * i, num_x_),
u_all.segment(num_u_ * i, num_u_),
&Ai, &B_i, &Gi);
S = Ai.T*S + S*Ai + ... // This is the Riccati equation.
// Now compute your cost with this S
...
}
}
void DoEval(const Eigen::Ref<const VectorX<symbolic::Variable>& x, VectorX<symbolic::Expression>* y) const {
// You don't need the symbolic version of the cost in nonlinear optimization.
throw std::runtime_error("Not implemented yet");
}
private:
LeafSystem<AutoDiffXd>* my_system_;
};
The DoEval function with autodiff version will compute the gradient of the cost for you automatically. Then you will need to call AddCost function in MathematicalProgram to add this cost together with all of x, u as the associated variable of this cost.
I am a student studying machine learning.
For my study, we need to differentiate the loss function by second order, we use "chainer.functions.sigmoid_cross_entropy".
A similar function is "chainer.functions.softmax_cross_entropy". This function has an argument ", enable_double_backprop" to realize the second derivative, but not in "chainer.functions.sigmoid_cross_entropy".
Is "chainer.functions.sigmoid_cross_entropy" a second-order differentiable function?
Please teach me!
chainer.functions.sigmoid_cross_entropy (x, t, normalize = True, reduce = 'mean')
chainer.functions.softmax_cross_entropy (x, t, normalize = True, cache_score = True,
class_weight = None, ignore_label = -1, reduce = 'mean', enable_double_backprop = False,
soft_target_loss = 'cross-entropy')
Yes, sigmoid_cross_entropy is second-order differentiable. For performance reasons, softmax_cross_entropy is not second-order differentiable unless enable_double_backprop=True is given.
Functions that does not support higher-order derivatives are listed in https://github.com/chainer/chainer/issues/4449.
I have this code:
function createRect(x, y, w, h)
local rect = {
type = "rect",
x = x,
y = y,
w = w,
h = h,
translate = function(rect, vector)
assert(vector.type == "vector2d")
local rect = shapes.createRect(rect.x + vector.x, rect.y + vector.y, rect.w, rect.h)
end,
}
return rect
end
translate = function(rect, vector)
assert(vector.type == "vector2d")
local rect = shapes.createRect(rect.x + vector.x, rect.y + vector.y, rect.w, rect.h)
end
local o = createRect(2,3,4,5)
local q = createRect(2,3,4,5)
print(o.translate, q.translate, translate)
Which is some very easy code and is written to test factory functions in Lua and is very reminiscent of the JS module pattern. Something people usually complain about when talking about factory functions is the memory footprint.
Because o and q are just assigned, of course they have different translate() functions, I assumed.
However I was proven wrong:
function: 0x7fcdbe600d50 function: 0x7fcdbe600d50 function: 0x7fcdbe600d90
Why is this? How can this even be? I assumed to be o.translate and q.translate to be different functions, however they are the same...
How can this even be? I assumed to be o.translate and q.translate to be different functions, however they are the same...
Normally you are correct, however Lua 5.2 introduced an optimization where anonymous functions may be cached if certain conditions are met. Specifically, if the values it references doesn't change between construction then the first created instance of that anonymous function gets reused.
Running your example in repl.it, Lua 5.1, shows this as one possible output:
function: 0xb81f30 function: 0xb81f00 function: 0xb82ca0
But running it under melpon.org/wandbox, Lua 5.2+, shows:
function: 0x14f0650 function: 0x14f0650 function: 0x14efb40
In your example, createRect creates and returns a different rect table for every call but the field rect.translate is being assigned the same anonymous function as the lua value due to this optimization.
Also see
http://lua-users.org/lists/lua-l/2010-07/threads.html#00339
http://lua-users.org/lists/lua-l/2010-07/msg00862.html
http://lua-users.org/lists/lua-l/2010-05/threads.html#00617
Specifically I would like nn.LogSoftMax to not use omp when the size of the input tensor is small. I have a small script to test the run time.
require 'nn'
my_lsm = function(t)
o = torch.zeros((#t)[1])
sum = 0.0
for i = 1,(#t)[1] do
o[i] = torch.exp(t[i])
sum = sum + o[i]
end
o = o / sum
return torch.log(o)
end
ii=torch.randn(arg[1])
m=nn.LogSoftMax()
timer = torch.Timer()
timer:stop()
timer:reset()
timer:resume()
my_lsm(ii)
print(timer:time().real)
timer:stop()
timer:reset()
timer:resume()
m:forward(ii)
print(timer:time().real)
If arg[1] is 10, then my basic log softmax function run much faster:
0.00021696090698242
0.033425092697144
But once arg[1] is 10,000,000, omp really helps a lot:
29.561321973801
0.11547803878784
So I suspect that omp overhead is very high. If my code has to call log softmax several times with small inputs (says tensor size is only 3), it will cost too much time. Is there a way to manually disable omp usage in some cases (but not always)?
Is there a way to manually disable omp usage in some cases (but not always)?
If you really want to do that one possibility is to use torch.setnumthreads and torch.getnumthreads like that:
local nth = torch.getnumthreads()
torch.setnumthreads(1)
-- do something
torch.setnumthreads(nth)
So you can monkey-patch nn.LogSoftMax as follow:
nn.LogSoftMax.updateOutput = function(self, input)
local nth = torch.getnumthreads()
torch.setnumthreads(1)
local out = input.nn.LogSoftMax_updateOutput(self, input)
torch.setnumthreads(nth)
return out
end