What are the parameters for training SVM - machine-learning

I'm doing software using machine learning with SVM using this library https://www.npmjs.com/package/machine_learning
according to the example of SVM:
svm.train({
C : 1.1, // default : 1.0. C in SVM.
tol : 1e-5, // default : 1e-4. Higher tolerance --> Higher precision
max_passes : 20, // default : 20. Higher max_passes --> Higher precision
alpha_tol : 1e-5, // default : 1e-5. Higher alpha_tolerance --> Higher precision
kernel : { type: "polynomial", c: 1, d: 5}
// default : {type : "gaussian", sigma : 1.0}
// {type : "gaussian", sigma : 0.5}
// {type : "linear"} // x*y
// {type : "polynomial", c : 1, d : 8} // (x*y + c)^d
// Or you can use your own kernel.
// kernel : function(vecx,vecy) { return dot(vecx,vecy);}
});
the parameter C tells the SVM optimization how much you want to avoid misclassifying each training example.
I do not understand the other parameters.

Just take a look at the equation of the soft-margin C-SVM:
It points out that C defines the trade-off between missclassifications and margin. This must be choosen sufficiently large depending on your data. What you'll also see here is the eps>0 parameter. This could possibly be your tolerance parameter and defines the error to the which is weighted by C parameter in the objective function.
For the kernel parameters, take a look at the dual problem for the SVM:
You'll see the term K(x_i,x_j). This is called the Kernel-Function. This function allows the SVM to learn non-linear descision boundaries. So if your data is not linearly separatable, you can use such a function to tranform your data, actually it's dot-product, into an higher dimensional feature space to separate them there. Just take a look at this guide, it will teach you the basics about the training process of an SVM and some best practices:
https://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf

Related

Dot product error with dimensions

I'm currently learning ML with julia and i've encoutered a problem when trying to do the dot product of 2 matrices, here is the code :
w, b = zeros(size(train_x, 1), 1), 0
println("Weights size : $(size(w'))")
println("Train X size : $(size(train_x))")
result = dot(w', train_x)
The shapes are :
* w shape : (1, 12288)
* train_x shape : (12288, 209)
This call give me an error which is :
DimensionMismatch("dot product arguments have lengths 12288 and 2568192")
Did i missed something ? This dot product is valid using numpy, so i'm a little bit confused.
The dot function in Julia is only meant for dot products in the strict sense -- the inner product on a vector space, ie., between two vectors. It seems like you just want to multiply a vector with a matrix. In that case you can use
w = zeros(size(train_x, 1)) # no need for the extra dimension
result = w' * train_x
* will do matrix-vector multiplication. In Julia, unlike in Numpy, but like in Matlab, .* is instead used for elementwise multiplication.

DL4J Prediction Formatting

I have two questions on deeplearning4j that are somewhat related.
When I execute “INDArray predicted = model.output(features,false);” to generate a prediction, I get the label predicted by the model; it is either 0 or 1. I tried to search for a way to have a probability (value between 0 and 1) instead of strictly 0 or 1. This is useful when you need to set a threshold for what your model should consider as a 0 and what it should consider as a 1. For example, you may want your model to output '1' for any prediction that is higher than or equal to 0.9 and output '0' otherwise.
My second question is that I am not sure why the output is represented as a two-dimensional array (shown after the code below) even though there are only two possibilities, so it would be better to represent it with one value - especially if we want it as a probability (question #1) which is one value.
PS: in case relevant to the question, in the Schema the output column is defined using ".addColumnInteger". Below are snippets of the code used.
Part of the code:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(1)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.learningRate(learningRate)
.updater(org.deeplearning4j.nn.conf.Updater.NESTEROVS).momentum(0.9)
.list()
.layer(0, new DenseLayer.Builder()
.nIn(numInputs)
.nOut(numHiddenNodes)
.weightInit(WeightInit.XAVIER)
.activation("relu")
.build())
.layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.weightInit(WeightInit.XAVIER)
.activation("softmax")
.weightInit(WeightInit.XAVIER)
.nIn(numHiddenNodes)
.nOut(numOutputs)
.build()
)
.pretrain(false).backprop(true).build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
model.setListeners(new ScoreIterationListener(10));
for (int n=0; n<nEpochs; n++) {
model.fit(trainIter);
}
Evaluation eval = new Evaluation(numOutputs);
while (testIter.hasNext()){
DataSet t = testIter.next();
INDArray features = t.getFeatureMatrix();
System.out.println("Input features: " + features);
INDArray labels = t.getLabels();
INDArray predicted = model.output(features,false);
System.out.println("Predicted output: "+ predicted);
System.out.println("Desired output: "+ labels);
eval.eval(labels, predicted);
System.out.println();
}
System.out.println(eval.stats());
Output from running the code above:
Input features: [0.10, 0.34, 1.00, 0.00, 1.00]
Predicted output: [1.00, 0.00]
Desired output: [1.00, 0.00]
*What I want the output to look like (i.e. a one-value probability):**
Input features: [0.10, 0.34, 1.00, 0.00, 1.00]
Predicted output: 0.14
Desired output: 0.0
I will answer your questions inline but I just want to note:
I would suggest taking a look at our docs and examples:
https://github.com/deeplearning4j/dl4j-examples
http://deeplearning4j.org/quickstart
A 100% 0 or 1 is just a badly tuned neural net. That's not at all how things work. A softmax by default returns probabilities. Your neural net is just badly tuned. Look at updating dl4j too. I'm not sure what version you're on but we haven't used strings in activations for at least a year now? You seem to have skipped a lot of steps when starting with us. I'll reiterate again, at least take a look above for a starting point rather than using year old code.
What you're seeing there is just standard deep learning 101. So the advice I'm about to give you can be found on the internet and is applicable for any deep learning software. A two label softmax sums each row to 1. If you want 1 label, use sigmoid with 1 output and a different loss function. We use softmax because it can work for any number of ouputs and all you have to do is change the number of outputs rather than having to change the loss function and activation function on top of that.

How to get the momentum term in a neural network in optim

|I have a neural network in torch7 and would like to check how the momentum of the neural network is developing, this because I want to modify/reduce it because I want to do some extra processing with the values and need the velocity term in order to do that.
So I have something like the following code:
for t = 1, params.num_iterations do
local x, losses = optim.adam(feval, img, optim_state)
img=postProccess(img,content_imageprep,params)
print(velocity) -- how?
end
And would like to see what the velocity is doing. Anybody know how to do this?
Printing the optim_state gives me the following output
v : CudaTensor - size: 1327104
m : CudaTensor - size: 1327104
learningRate : 10
denom : CudaTensor - size: 1327104
t : 4
but I'm now sure if and if so what term represents the velocity, anybody know?
You won't find the value of the momentum in the state argument but in the config argument (which is absent in your function call, then the momentum value will be equal to its default value, i.e 0.9 for beta1 and 0.999 for beta2.
Have a look at the source code https://github.com/torch/optim/blob/master/adam.lua#L24

How to apply different cost functions to different output channels of a convolutional network?

I have a convolutional neural network whose output is a 4-channel 2D image. I want to apply sigmoid activation function to the first two channels and then use BCECriterion to computer the loss of the produced images with the ground truth ones. I want to apply squared loss function to the last two channels and finally computer the gradients and do backprop. I would also like to multiply the cost of the squared loss for each of the two last channels by a desired scalar.
So the cost has the following form:
cost = crossEntropyCh[{1, 2}] + l1 * squaredLossCh_3 + l2 * squaredLossCh_4
The way I'm thinking about doing this is as follow:
criterion1 = nn.BCECriterion()
criterion2 = nn.MSECriterion()
error = criterion1:forward(model.output[{{}, {1, 2}}], groundTruth1) + l1 * criterion2:forward(model.output[{{}, {3}}], groundTruth2) + l2 * criterion2:forward(model.output[{{}, {4}}], groundTruth3)
However, I don't think this is the correct way of doing it since I will have to do 3 separate backprop steps, one for each of the cost terms. So I wonder, can anyone give me a better solution to do this in Torch?
SplitTable and ParallelCriterion might be helpful for your problem.
Your current output layer is followed by nn.SplitTable that splits your output channels and converts your output tensor into a table. You can also combine different functions by using ParallelCriterion so that each criterion is applied on the corresponding entry of output table.
For details, I suggest you read documentation of Torch about tables.
After comments, I added the following code segment solving the original question.
M = 100
C = 4
H = 64
W = 64
dataIn = torch.rand(M, C, H, W)
layerOfTables = nn.Sequential()
-- Because SplitTable discards the dimension it is applied on, we insert
-- an additional dimension.
layerOfTables:add(nn.Reshape(M,C,1,H,W))
-- We want to split over the second dimension (i.e. channels).
layerOfTables:add(nn.SplitTable(2, 5))
-- We use ConcatTable in order to create paths accessing to the data for
-- numereous number of criterions. Each branch from the ConcatTable will
-- have access to the data (i.e. the output table).
criterionPath = nn.ConcatTable()
-- Starting from offset 1, NarrowTable will select 2 elements. Since you
-- want to use this portion as a 2 dimensional channel, we need to combine
-- then by using JoinTable. Without JoinTable, the output will be again a
-- table with 2 elements.
criterionPath:add(nn.Sequential():add(nn.NarrowTable(1, 2)):add(nn.JoinTable(2)))
-- SelectTable is simplified version of NarrowTable, and it fetches the desired element.
criterionPath:add(nn.SelectTable(3))
criterionPath:add(nn.SelectTable(4))
layerOfTables:add(criterionPath)
-- Here goes the criterion container. You can use this as if it is a regular
-- criterion function (Please see the examples on documentation page).
criterionContainer = nn.ParallelCriterion()
criterionContainer:add(nn.BCECriterion())
criterionContainer:add(nn.MSECriterion())
criterionContainer:add(nn.MSECriterion())
Since I used almost every possible table operation, it looks a little bit nasty. However, this is the only way I could solve this problem. I hope that it helps you and others suffering from the same problem. This is how the result looks like:
dataOut = layerOfTables:forward(dataIn)
print(dataOut)
{
1 : DoubleTensor - size: 100x2x64x64
2 : DoubleTensor - size: 100x1x64x64
3 : DoubleTensor - size: 100x1x64x64
}

Add Bias to a Matrix in Torch

In Torch, how do I add a bias vector to each input to when i have a batch input? Suppose I have an input 3*2 matrix (where 2 = number of classes)
A
0.8191 0.2630
0.5344 0.4537
0.7380 0.5885
and I want to add the bias value to each element in the output matrix:
BIAS:
0.6588 0.6525
My final output should look like:
1.4779 0.9155
1.1931 1.1063
1.3967 1.2410
I am new to Torch and just figuring out the Syntax.
You can expand the bias to have the same dimensions as your input:
expandedBias=torch.expand(BIAS,3,2)
yields:
th> expandedBias
0.6588 0.6525
0.6588 0.6525
0.6588 0.6525
After that you can simply add them:
output=A+expandedBias
to give:
th> A+expandedBias
1.4779 0.9155
1.1931 1.1063
1.3967 1.2410
If you are using one of the latest versions of the torch, you do not even need to expand the bias.
you can directly write.
output = A + bias
The bias matrix will be broadcasted automatically. Check the documentation for the details of broadcasting.
https://pytorch.org/docs/stable/notes/broadcasting.html

Resources