There is an implementation for RBMs. Thr original RBM implementation is for the discrete data such as images, my data is a binary data, does the code work for real data too? I read somewhere that there is a gaussin version of the typical RBM that works for that, is it also implemented in that module?
In short, an RBM is simply a Markov random field on a bipartite graph. So therefore you can have any probability distribution to describe the relationship between nodes.
In terms of code, you don't really need to copy things explicitly. The role that the probability function selected will play is in the contrastive divergence algorithm. You should only have to change the way that the samples are selected. The parts of the code that need to be changed are copied below.
def sample_h_given_v(self, v0_sample):
''' This function infers state of hidden units given visible units '''
# compute the activation of the hidden units given a sample of
# the visibles
pre_sigmoid_h1, h1_mean = self.propup(v0_sample)
# get a sample of the hiddens given their activation
# Note that theano_rng.binomial returns a symbolic sample of dtype
# int64 by default. If we want to keep our computations in floatX
# for the GPU we need to specify to return the dtype floatX
h1_sample = self.theano_rng.binomial(size=h1_mean.shape,
n=1, p=h1_mean,
dtype=theano.config.floatX)
return [pre_sigmoid_h1, h1_mean, h1_sample]
def propdown(self, hid):
'''This function propagates the hidden units activation downwards to
the visible units
Note that we return also the pre_sigmoid_activation of the
layer. As it will turn out later, due to how Theano deals with
optimizations, this symbolic variable will be needed to write
down a more stable computational graph (see details in the
reconstruction cost function)
'''
pre_sigmoid_activation = T.dot(hid, self.W.T) + self.vbias
return [pre_sigmoid_activation, T.nnet.sigmoid(pre_sigmoid_activation)]
def sample_v_given_h(self, h0_sample):
''' This function infers state of visible units given hidden units '''
# compute the activation of the visible given the hidden sample
pre_sigmoid_v1, v1_mean = self.propdown(h0_sample)
# get a sample of the visible given their activation
# Note that theano_rng.binomial returns a symbolic sample of dtype
# int64 by default. If we want to keep our computations in floatX
# for the GPU we need to specify to return the dtype floatX
v1_sample = self.theano_rng.binomial(size=v1_mean.shape,
n=1, p=v1_mean,
dtype=theano.config.floatX)
return [pre_sigmoid_v1, v1_mean, v1_sample]
Related
Does anyone have experience of training a support vector machine (SVM) in Julia (1.4.1) ?
I tried the LIBSVM interface, but the example on the gituhub page gave an error :
# Load Fisher's classic iris data
iris = dataset("datasets", "iris")
# LIBSVM handles multi-class data automatically using a one-against-one strategy
labels = convert(Vector, iris[:Species])
# First dimension of input data is features; second is instances
instances = convert(Array, iris[:, 1:4])'
# Train SVM on half of the data using default parameters. See documentation
# of svmtrain for options
model = svmtrain(instances[:, 1:2:end], labels[1:2:end]);```
ERROR: MethodError: no method matching LIBSVM.SupportVectors(::Int32, ::Array{Int32,1}, ::CategoricalArray{String,1,UInt8,String,CategoricalValue{String,UInt8},Union{}}, ::Array{Float64,2}, ::Array{Int32,1}, ::Array{LIBSVM.SVMNode,1})
Closest candidates are:
LIBSVM.SupportVectors(::Int32, ::Array{Int32,1}, ::Array{T,1}, ::AbstractArray{U,2}, ::Array{Int32,1}, ::Array{LIBSVM.SVMNode,1}) where {T, U} at /home/benny/.julia/packages/LIBSVM/5Z99T/src/LIBSVM.jl:18
LIBSVM.SupportVectors(::LIBSVM.SVMModel, ::Any, ::Any) at /home/benny/.julia/packages/LIBSVM/5Z99T/src/LIBSVM.jl:27
It looks like LIBSVM.jl documentation is rather outdated and package was not updated appropriately, so it worth an issue (or at least pull request to update README).
Error that you see is not related to the package itself, but the fact that in current versions of DataFrames.jl and RDatasets.jl labels column is no longer Vector (as it was at the time when LIBSVM.jl was developed) but CategoricalArray. You can avoid this problem by converting CategoricalArray to usual Vector{String}. Complete example looks like this
using RDatasets, LIBSVM
using StatsBase, Printf # `mean` and `printf` are no longer in Base, and should be used explicitly
# Load Fisher's classic iris data
iris = dataset("datasets", "iris")
# LIBSVM handles multi-class data automatically using a one-against-one strategy
labels = string.(convert(Vector, iris[:Species]))
# First dimension of input data is features; second is instances
instances = convert(Array, iris[:, 1:4])'
# Train SVM on half of the data using default parameters. See documentation
# of svmtrain for options
model = svmtrain(instances[:, 1:2:end], labels[1:2:end]);
# Test model on the other half of the data.
(predicted_labels, decision_values) = svmpredict(model, instances[:, 2:2:end]);
# Compute accuracy
#printf "Accuracy: %.2f%%\n" mean((predicted_labels .== labels[2:2:end]))*100
Alternatively, you can use MLJ.jl or ScikitLearn.jl
which should correctly wrap LIBSVM.jl on their own.
Oskin's answer is for an older version.
In the current version, it should be modified as,
using RDatasets, LIBSVM
using StatsBase, Printf # `mean` and `printf` are no longer in Base, and should be used explicitly
# Load Fisher's classic iris data
iris = dataset("datasets", "iris")
# LIBSVM handles multi-class data automatically using a one-against-one strategy
labels = string.(convert(Vector, iris[:,:Species]))
# First dimension of input data is features; second is instances
instances = Matrix(iris[:, 1:4])'
# Train SVM on half of the data using default parameters. See documentation
# of svmtrain for options
model = svmtrain(instances[:, 1:2:end], labels[1:2:end]);
# Test model on the other half of the data.
(predicted_labels, decision_values) = svmpredict(model, instances[:, 2:2:end]);
# Compute accuracy
#printf "Accuracy: %.2f%%\n" mean((predicted_labels .== labels[2:2:end]))*100
I would like to use the DeepQLearning.jl package from https://github.com/JuliaPOMDP/DeepQLearning.jl. In order to do so, we have to do something similar to
using DeepQLearning
using POMDPs
using Flux
using POMDPModels
using POMDPSimulators
using POMDPPolicies
# load MDP model from POMDPModels or define your own!
mdp = SimpleGridWorld();
# Define the Q network (see Flux.jl documentation)
# the gridworld state is represented by a 2 dimensional vector.
model = Chain(Dense(2, 32), Dense(32, length(actions(mdp))))
exploration = EpsGreedyPolicy(mdp, LinearDecaySchedule(start=1.0, stop=0.01, steps=10000/2))
solver = DeepQLearningSolver(qnetwork = model, max_steps=10000,
exploration_policy = exploration,
learning_rate=0.005,log_freq=500,
recurrence=false,double_q=true, dueling=true, prioritized_replay=true)
policy = solve(solver, mdp)
sim = RolloutSimulator(max_steps=30)
r_tot = simulate(sim, mdp, policy)
println("Total discounted reward for 1 simulation: $r_tot")
In the line mdp = SimpleGridWorld(), we create the MDP. When I was trying to create the MDP, I had the problem of very large state space. A state in my MDP is a vector in {1,2,...,m}^n for some m and n. So, when defining the function POMDPs.states(mdp::myMDP), I realized that I must iterate over all the states which are very large, i.e., m^n.
Am I using the package in the wrong way? Or we must iterate the states even if there are exponentially many? If the latter, then what is the point of using Deep Q Learning? I thought, Deep Q Learning can help when the action and state spaces are very large.
DeepQLearning does not require to enumerate the state space and can handle continuous space problems.
DeepQLearning.jl only uses the generative interface of POMDPs.jl. As such, you do not need to implement the states function but just gen and initialstate (see the link on how to implement the generative interface).
However, due to the discrete action nature of DQN you also need POMDPs.actions(mdp::YourMDP) which should return an iterator over the action space.
By making those modifications to your implementation you should be able to use the solver.
The neural network in DQN takes as input a vector representation of the state. If your state is a m dimensional vector, the neural network input will be of size m. The output size of the network will be equal to the number of actions in your model.
In the case of the grid world example, the input size of the Flux model is 2 (x, y positions) and the output size is length(actions(mdp))=4.
Inside an autoregressive continuous problem, when the zeros take too much place, it is possible to treat the situation as a zero-inflated problem (i.e. ZIB). In other words, instead of working to fit f(x), we want to fit g(x)*f(x) where f(x) is the function we want to approximate, i.e. y, and g(x) is a function which output a value between 0 and 1 depending if a value is zero or non-zero.
Currently, I have two models. One model which gives me g(x) and another model which fits g(x)*f(x).
The first model gives me a set of weights. This is where I need your help. I can use the sample_weights arguments with model.fit(). As I work with tremendous amount of data, then I need to work with model.fit_generator(). However, fit_generator() does not have the argument sample_weights.
Is there a work around to work with sample_weights inside fit_generator()? Otherwise, how can I fit g(x)*f(x) knowing that I have already a trained model for g(x)?
You can provide sample weights as the third element of the tuple returned by the generator. From Keras documentation on fit_generator:
generator: A generator or an instance of Sequence (keras.utils.Sequence) object in order to avoid duplicate data when using multiprocessing. The output of the generator must be either
a tuple (inputs, targets)
a tuple (inputs, targets, sample_weights).
Update: Here is a rough sketch of a generator that returns the input samples and targets as well as the sample weights obtained from model g(x):
def gen(args):
while True:
for i in range(num_batches):
# get the i-th batch data
inputs = ...
targets = ...
# get the sample weights
weights = g.predict(inputs)
yield inputs, targets, weights
model.fit_generator(gen(args), steps_per_epoch=num_batches, ...)
I am wondering is there any option in sklearn classifiers to fit using some hyperparameters and after changing a few hyperparameter(s), refit the model by saving computation (fit) cost.
Let us say, Logistic Regression is fit using C=1e5 (logreg=linear_model.LogisticRegression(C=1e5)) and we change only C to C=1e3. I want to save some computation because only one parameter is changed.
Yes, there is a technique called warm_start which, citing from the documentation, means:
warm_start : bool, default: False
When set to True, reuse the solution of the previous call to fit as initialization, otherwise,
just erase the previous solution. Useless for liblinear solver.
As described in the documentation here, it's available in LogisticRegression :
sklearn.linear_model.LogisticRegression(..., warm_start=False, n_jobs=1)
So concretely, for your case you would do the following:
from sklearn.linear_model import LogisticRegression
# create an instance of LogisticRegression with warm_start=True
logreg = LogisticRegression(C=1e5, warm_start=True)
# you can access the C parameter's value as follows
logreg.C
# it's set to 100000.0
# ....
# train your model here by calling logreg.fit(..)
# ....
# reset the value of the C parameter as follows
logreg.C = 1e3
logreg.C
# now it's set to 1000.0
# ....
# re-train your model here by calling logreg.fit(..)
# ....
As far as I have been able to check quickly, it's available also in the following:
sklearn.ensemble.RandomForestClassifier
sklearn.ensemble.GradientBoostingClassifier
sklearn.linear_model.PassiveAggressiveClassifier
sklearn.ensemble.BaggingClassifier
sklearn.ensemble.ExtraTreesClassifier
Keras enables adding a layer which calculates a user defined lambda function.
What I don't get is how Keras knows to calculate the gradient of this user defined function for the backpropagation.
That one of the benefit of using Theano/Tensorflow and libraries build on top of them. They can give you automatic gradient calculation of the mathematical functions and operations.
Keras gets them by calling:
# keras/theano_backend.py
def gradients(loss, variables):
return T.grad(loss, variables)
# keras/tensorflow_backend.py
def gradients(loss, variables):
'''Returns the gradients of `variables` (list of tensor variables)
with regard to `loss`.
'''
return tf.gradients(loss, variables, colocate_gradients_with_ops=True)
which are in turn called by the optimizers(keras/optimizers.py) grads = self.get_gradients(loss, params) to get the gradients which are used to write the update rule for all the params. params here are the trainable weights of the layers. But layers created by the Lambda functional layers don't have any trainable weights. But they affect the loss function though the forward prob and hence indirectly affect the calculation of the gradients of trainable weights of other layers.
The only time you need to write new gradient calculation is when you are defining a new basic mathematical operation/function. Also, when you write a custom loss function the auto grad almost always takes care of the gradient calculation. But optionally you can optimize training (not always) if you implement analytical gradient of your custom functions. For example softwax function can be expressed in exp, sum and div and auto grad can take care of it, but its analytical/symbolic grad is usually implemented in Theano/Tensorflow.
For implementing new Ops you can see the below links for that:
http://deeplearning.net/software/theano/extending/extending_theano.html
https://www.tensorflow.org/versions/r0.12/how_tos/adding_an_op/index.html