no method matching logpdf when sampling from uniform distribution - machine-learning

I am trying to use reinforcement learning in julia to teach a car that is constantly being accelerated backwards (but with a positive initial velocity) to apply brakes so that it gets as close to a target distance as possible before moving backwards.
To do this, I am making use of POMDPs.jl and crux.jl which has many solvers (I'm using DQN). I will list what I believe to be the relevant parts of the script first, and then more of it towards the end.
To define the MDP, I set the initial position, velocity, and force from the brakes as a uniform distribution over some values.
#with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
...
end
My state holds the values of (position, velocity, brake force), and the initial state is given as:
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
Then, I set up my DQN solver using crux.jl and called a function to solve for the policy
solver_dqn = DQN(π=Q_network(), S=s, N=30000)
policy_dqn = solve(solver_dqn, mdp)
calling solve() gives me the error MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing). I am quite sure that this comes from the initial state sampling, but I am not sure why or how to fix it. I have only been learning RL from various books and online lectures for a very short time, so any help regarding the error or my the model I set up (or anything else I'm oblivious to) would be appreciated.
More comprehensive code:
Packages:
using POMDPs
using POMDPModelTools
using POMDPPolicies
using POMDPSimulators
using Parameters
using Random
using Crux
using Flux
using Distributions
Rest of it:
#with_kw struct SliderMDP <: MDP{Array{Float32}, Array{Float32}}
x0 = Distributions.Uniform(0., 80.)# Distribution to sample initial position
v0 = Distributions.Uniform(0., 25.) # Distribution to sample initial velocity
d0 = Distributions.Uniform(0., 2.) # Distribution to sample brake force
m::Float64 = 1.
tension::Float64 = 3.
dmax::Float64 = 2.
target::Float64 = 80.
dt::Float64 = .05
γ::Float32 = 1.
actions::Vector{Float64} = [-.1, 0., .1]
end
function POMDPs.gen(env::SliderMDP, s, a, rng::AbstractRNG = Random.GLOBAL_RNG)
x, ẋ, d = s
if x >= env.target
a = .1
end
if d+a >= env.dmax || d+a <= 0
a = 0.
end
force = (d + env.tension) * -1
ẍ = force/env.m
# Simulation
x_ = x + env.dt * ẋ
ẋ_ = ẋ + env.dt * ẍ
d_ = d + a
sp = vcat(x_, ẋ_, d_)
reward = abs(env.target - x) * -1
return (sp=sp, r=reward)
end
function POMDPs.initialstate(mdp::SliderMDP)
ImplicitDistribution((rng) -> Float32.([rand(rng, mdp.x0), rand(rng, mdp.v0), rand(rng, mdp.d0)]))
end
POMDPs.isterminal(mdp::SliderMDP, s) = s[2] <= 0
POMDPs.discount(mdp::SliderMDP) = mdp.γ
mdp = SliderMDP();
s = state_space(mdp); # Using Crux.jl
function Q_network()
layer1 = Dense(3, 64, relu)
layer2 = Dense(64, 64, relu)
layer3 = Dense(64, length(3))
return DiscreteNetwork(Chain(layer1, layer2, layer3), [-.1, 0, .1])
end
solver_dqn = DQN(π=Q_network(), S=s, N=30000) # Using Crux.jl
policy_dqn = solve(solver_dqn, mdp) # Error comes here
Stacktrace:
policy_dqn
MethodError: no method matching logpdf(::Distributions.Categorical{Float64, Vector{Float64}}, ::Nothing)
Closest candidates are:
logpdf(::Distributions.DiscreteNonParametric, !Matched::Real) at C:\Users\name\.julia\packages\Distributions\Xrm9e\src\univariate\discrete\discretenonparametric.jl:106
logpdf(::Distributions.UnivariateDistribution{S} where S<:Distributions.ValueSupport, !Matched::AbstractArray) at deprecated.jl:70
logpdf(!Matched::POMDPPolicies.PlaybackPolicy, ::Any) at C:\Users\name\.julia\packages\POMDPPolicies\wMOK3\src\playback.jl:34
...
logpdf(::Crux.ObjectCategorical, ::Float32)#utils.jl:16
logpdf(::Crux.DistributionPolicy, ::Vector{Float64}, ::Float32)#policies.jl:305
var"#exploration#133"(::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, ::typeof(Crux.exploration), ::Crux.DistributionPolicy, ::Vector{Float64})#policies.jl:302
exploration#policies.jl:297[inlined]
action(::Crux.DistributionPolicy, ::Vector{Float64})#policies.jl:294
var"#exploration#136"(::Crux.DiscreteNetwork, ::Int64, ::typeof(Crux.exploration), ::Crux.MixedPolicy, ::Vector{Float64})#policies.jl:326
var"#step!#173"(::Bool, ::Int64, ::typeof(Crux.step!), ::Dict{Symbol, Array}, ::Int64, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})#sampler.jl:55
var"#steps!#174"(::Int64, ::Bool, ::Int64, ::Bool, ::Bool, ::Bool, ::typeof(Crux.steps!), ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace})#sampler.jl:108
var"#fillto!#177"(::Int64, ::Bool, ::typeof(Crux.fillto!), ::Crux.ExperienceBuffer{Array}, ::Crux.Sampler{Main.workspace#2.SliderMDP, Vector{Float32}, Crux.DiscreteNetwork, Crux.ContinuousSpace{Tuple{Int64}}, Crux.DiscreteSpace}, ::Int64)#sampler.jl:156
solve(::Crux.OffPolicySolver, ::Main.workspace#2.SliderMDP)#off_policy.jl:86
top-level scope#Local: 1[inlined]

Short answer:
Change your output vector to Float32 i.e. Float32[-.1, 0, .1].
Long answer:
Crux creates a Distribution over your network's output values, and at some point (policies.jl:298) samples a random value from it. It then converts this value to a Float32. Later (utils.jl:15) it does a findfirst to find the index of this value in the original output array (stored as objs within the distribution), but because the original array is still Float64, this fails and returns a nothing. Hence the error.
I believe this (converting the sampled value but not the objs array and/or not using approximate equality check i.e. findfirst(isapprox(x), d.objs)) to be a bug in the package, and would encourage you to raise this as an issue on Github.

Related

Tensorflow: Variable rnn/basic_lstm_cell/kernel already exists, disallowed

Here is a part of my tensorflow RNN network code written in jupyter. The whole code runs perfect for the first time, however, running it furthermore produces an error. The code:
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
def recurrent_nn_model(x):
x = tf.transpose(x, [1,0,2])
x = tf.reshape(x, [-1, chunk_size])
x = tf.split(x, n_chunks, 0)
lstm_layer = {'hidden_state': tf.zeros([n_batches, lstm_size]),
'current_state': tf.zeros([n_batches, lstm_size])}
layer = {'weights': tf.Variable(tf.random_normal([lstm_size, n_classes])),
'biases': tf.Variable(tf.random_normal([n_classes]))}
lstm = rnn_cell.BasicLSTMCell(lstm_size)
rnn_outputs, rnn_states = rnn.static_rnn(lstm, x, dtype=tf.float32)
output = tf.add(tf.matmul(rnn_outputs[-1], layer['weights']),
layer['biases'])
return output
The error is:
Variable rnn/basic_lstm_cell/kernel already exists, disallowed. Did
you mean to set reuse=True in VarScope? Originally defined at:
If recurrent_nn_model is the whole network, just add this line to reset previously defined graph:
tf.reset_default_graph()
If you're intentionally calling recurrent_nn_model several times and combine these RNNs into one graph, you should use different variable scopes for each one:
with tf.variable_scope('lstm1'):
recurrent_nn_model(x1)
with tf.variable_scope('lstm2'):
recurrent_nn_model(x2)

How can I undo differencing (lag order 1) of a square-root transformed variable? [ARIMA()]

I am using statsmodels.tsa.arima_model.ARIMA, and I took the square root transform of the endogenous variable before plugging it into the algorithm. The model uses a differencing order of 1:
model = ARIMA(sj_sqrt, order=(2, 1, 0))
After fitting the model and grabbing the predictions, I want to put the predictions back in the original form for comparison with the original data. However, I can't seem to transform them back correctly.
To replicate a simple version of this problem, here is some code:
#original data:
test = pd.Series([1,1,1,50,1,1,1,1,1,1,1,1,40,1,1,2,1,1,1,1,1])
#sqrt transformed data:
test_sqrt = np.sqrt(test)
#sqrt differenced data:
test_sqrt_diff = test_sqrt.diff(periods=1)
#undo differencing:
test_sqrt_2 = cumsum(test_sqrt_diff)
#undo transformations:
test_2 = test_sqrt_2 ** 2
f, axarr = plt.subplots(5, sharex=True, sharey=True)
axarr[0].set_title('original data:')
axarr[0].plot(test)
axarr[1].set_title('sqrt transformed data:')
axarr[1].plot(test_sqrt)
axarr[2].set_title('sqrt differenced data:')
axarr[2].plot(test_sqrt_diff)
axarr[3].set_title('differencing undone with .cumsum():')
axarr[3].plot(test_sqrt_2)
axarr[4].set_title('transformation undone by squaring:')
axarr[4].plot(test_2)
f.set_size_inches(5, 12)
You can see from the graphs that the undifferenced, untransformed data is not quite on the same scale. test[3] returns 50, and test_2[3] returns 36.857864376269056
Solution:
## original
x = np.array([1,1,1,50,1,1,1,1,1,1,1,1,40,1,1,2,1,1,1,1,1])
## sqrt
x_sq = np.sqrt(x)
## diff
d_sq = np.diff(x_sq,n=1)
## Only works when d = 1
def diffinv(d,i):
inv = np.insert(d,0,i)
inv = np.cumsum(inv)
return inv
## inv diff
y_sq = diffinv(d_sq,x_sq[0])
## Check inv diff
(y_sq==x_sq).all()

Tensorflow LSTM PTB Example - Understanding forward and backward pass

Right now I am going through the tensorflow example on LSTMs where they use the PTB dataset to create an LSTM network capable of predicting the next word. I've spent a lot of time trying to understand the code, and have a good understanding for most of it however there is one function which I don't fully grasp:
def run_epoch(session, model, eval_op=None, verbose=False):
"""Runs the model on the given data."""
costs = 0.0
iters = 0
state = session.run(model.initial_state)
fetches = {
"cost": model.cost,
"final_state": model.final_state,
}
if eval_op is not None:
fetches["eval_op"] = eval_op
for step in range(model.input.epoch_size):
feed_dict = {}
for i, (c, h) in enumerate(model.initial_state):
feed_dict[c] = state[i].c
feed_dict[h] = state[i].h
vals = session.run(fetches, feed_dict)
cost = vals["cost"]
state = vals["final_state"]
costs += cost
iters += model.input.num_steps
return np.exp(costs / iters)
My confusion is this: each time through the outerloop I believe we have processed batch_size * num_steps numbers of words, done the forward propagation and done the backward propagation. But, how in the next iteration, for example, do we know to start with the 36th word of each batch if num_steps = 35? I suspect it is some change in an attribute of the class model on each iteration but I cannot figure that out. Thanks for your help.

kNN Consistently Overusing One Label

I am using a kNN to do some classification of labeled images. After my classification is done, I am outputting a confusion matrix. I noticed that one label, bottle was being applied incorrectly more often.
I removed the label and tested again, but then noticed that another label, shoe was being applied incorrectly, but was fine last time.
There should be no normalization, so I'm unsure what is causing this behavior. Testing showed it continued no matter how many labels I removed.
Not totally sure how much code to post, so I'll put some things that should be relevant and pastebin the rest.
def confusionMatrix(classifier, train_DS_X, train_DS_y, test_DS_X, test_DS_y):
# Will output a confusion matrix graph for the predicion
y_pred = classifier.fit(train_DS_X, train_DS_y).predict(test_DS_X)
labels = set(set(train_DS_y) | set(test_DS_y))
def plot_confusion_matrix(cm, title='Confusion matrix', cmap=plt.cm.Blues):
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(labels))
plt.xticks(tick_marks, labels, rotation=45)
plt.yticks(tick_marks, labels)
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
# Compute confusion matrix
cm = confusion_matrix(test_DS_y , y_pred)
np.set_printoptions(precision=2)
print('Confusion matrix, without normalization')
#print(cm)
plt.figure()
plot_confusion_matrix(cm)
# Normalize the confusion matrix by row (i.e by the number of samples
# in each class)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print('Normalized confusion matrix')
#print(cm_normalized)
plt.figure()
plot_confusion_matrix(cm_normalized, title='Normalized confusion matrix')
plt.show()
Relevant Code from Main Function:
# Select training and test data
PCA = decomposition.PCA(n_components=.95)
zscorer = ZScoreMapper(param_est=('targets', ['rest']), auto_train=False)
DS = getVoxels (1, .5)
train_DS = DS[0]
test_DS = DS[1]
# Apply PCA and ZScoring
train_DS = processVoxels(train_DS, True, zscorer, PCA)
test_DS = processVoxels(test_DS, False, zscorer, PCA)
print 3*"\n"
# Select the desired features
# If selecting samples or PCA, that must be the only feature
featuresOfInterest = ['pca']
trainDSFeat = selectFeatures(train_DS, featuresOfInterest)
testDSFeat = selectFeatures(test_DS, featuresOfInterest)
train_DS_X = trainDSFeat[0]
train_DS_y = trainDSFeat[1]
test_DS_X = testDSFeat[0]
test_DS_y = testDSFeat[1]
# Optimization of neighbors
# Naively searches for local max starting at numNeighbors
lastScore = 0
lastNeightbors = 1
score = .0000001
numNeighbors = 5
while score > lastScore:
lastScore = score
lastNeighbors = numNeighbors
numNeighbors += 1
#Classification
neigh = neighbors.KNeighborsClassifier(n_neighbors=numNeighbors, weights='distance')
neigh.fit(train_DS_X, train_DS_y)
#Testing
score = neigh.score(test_DS_X,test_DS_y )
# Confusion Matrix Output
neigh = neighbors.KNeighborsClassifier(n_neighbors=lastNeighbors, weights='distance')
confusionMatrix(neigh, train_DS_X, train_DS_y, test_DS_X, test_DS_y)
Pastebin: http://pastebin.com/U7yTs3vs
The issue was in part the result of my axis being mislabeled, when I thought I was removing the faulty label I was in actuality just removing a random label, meaning the faulty data was still being analyzed. Fixing the axis and removing the faulty label which was actually rest yielded:
The code I changed is:
cm = confusion_matrix(test_DS_y , y_pred, labels)
Basically I manually set the ordering based on my list of ordered labels.

Total sum from a set (logic)

I have a logic problem for an iOS app but I don't want to solve it using brute-force.
I have a set of integers, the values are not unique:
[3,4,1,7,1,2,5,6,3,4........]
How can I get a subset from it with these 3 conditions:
I can only pick a defined amount of values.
The sum of the picked elements are equal to a value.
The selection must be random, so if there's more than one solution to the value, it will not always return the same.
Thanks in advance!
This is the subset sum problem, it is a known NP-Complete problem, and thus there is no known efficient (polynomial) solution to it.
However, if you are dealing with only relatively low integers - there is a pseudo polynomial time solution using Dynamic Programming.
The idea is to build a matrix bottom-up that follows the next recursive formulas:
D(x,i) = false x<0
D(0,i) = true
D(x,0) = false x != 0
D(x,i) = D(x,i-1) OR D(x-arr[i],i-1)
The idea is to mimic an exhaustive search - at each point you "guess" if the element is chosen or not.
To get the actual subset, you need to trace back your matrix. You iterate from D(SUM,n), (assuming the value is true) - you do the following (after the matrix is already filled up):
if D(x-arr[i-1],i-1) == true:
add arr[i] to the set
modify x <- x - arr[i-1]
modify i <- i-1
else // that means D(x,i-1) must be true
just modify i <- i-1
To get a random subset at each time, if both D(x-arr[i-1],i-1) == true AND D(x,i-1) == true choose randomly which course of action to take.
Python Code (If you don't know python read it as pseudo-code, it is very easy to follow).
arr = [1,2,4,5]
n = len(arr)
SUM = 6
#pre processing:
D = [[True] * (n+1)]
for x in range(1,SUM+1):
D.append([False]*(n+1))
#DP solution to populate D:
for x in range(1,SUM+1):
for i in range(1,n+1):
D[x][i] = D[x][i-1]
if x >= arr[i-1]:
D[x][i] = D[x][i] or D[x-arr[i-1]][i-1]
print D
#get a random solution:
if D[SUM][n] == False:
print 'no solution'
else:
sol = []
x = SUM
i = n
while x != 0:
possibleVals = []
if D[x][i-1] == True:
possibleVals.append(x)
if x >= arr[i-1] and D[x-arr[i-1]][i-1] == True:
possibleVals.append(x-arr[i-1])
#by here possibleVals contains 1/2 solutions, depending on how many choices we have.
#chose randomly one of them
from random import randint
r = possibleVals[randint(0,len(possibleVals)-1)]
#if decided to add element:
if r != x:
sol.append(x-r)
#modify i and x accordingly
x = r
i = i-1
print sol
P.S.
The above give you random choice, but NOT with uniform distribution of the permutations.
To achieve uniform distribution, you need to count the number of possible choices to build each number.
The formulas will be:
D(x,i) = 0 x<0
D(0,i) = 1
D(x,0) = 0 x != 0
D(x,i) = D(x,i-1) + D(x-arr[i],i-1)
And when generating the permutation, you do the same logic, but you decide to add the element i in probability D(x-arr[i],i-1) / D(x,i)

Resources