What is the difference between softmax and log-softmax? - machine-learning

The difference between these two functions that has been described in this pytorch post: What is the difference between log_softmax and softmax?
is: exp(x_i) / exp(x).sum()
and log softmax is: log(exp(x_i) / exp(x).sum()).
But for the Pytorch code below why am I getting different output:
>>> it = autograd.Variable(torch.FloatTensor([0.6229,0.3771]))
>>> op = autograd.Variable(torch.LongTensor([0]))
>>> m = nn.Softmax()
>>> log = nn.LogSoftmax()
>>> m(it)
Variable containing:
`0.5611 0.4389`
[torch.FloatTensor of size 1x2]
>>>log(it)
Variable containing:
-0.5778 -0.8236
[torch.FloatTensor of size 1x2]
However, the value log(0.5611) is -0.25095973129 and log(0.4389) is -0.35763441915
Why is there such discrepancy?

By default, torch.log provides the natural logarithm of the input, so the output of PyTorch is correct:
ln([0.5611,0.4389])=[-0.5778,-0.8236]
Your last results are obtained using the logarithm with base 10.

Not just by default but always torch.log is natural log.
While torch.log10 is base 10 log.

Related

Sklearn: Found input variables with inconsistent numbers of samples:

I have built a model.
est1_pre = ColumnTransformer([('catONEHOT', OneHotEncoder(dtype='int',handle_unknown='ignore'),['Var1'])],remainder='drop')
est2_pre = ColumnTransformer([('BOW', TfidfVectorizer(ngram_range=(1, 3),max_features=1000),['Var2'])],remainder='drop')
m1= Pipeline([('FeaturePreprocessing', est1_pre),
('clf',alternative)])
m2= Pipeline([('FeaturePreprocessing', est2_pre),
('clf',alternative)])
model_combo = StackingClassifier(
estimators=[('cate',m1),('text',m2)],
final_estimator=RandomForestClassifier(n_estimators=10,
random_state=42)
)
I can successfully, fit and predict using m1 and m2.
However, when I look at the combination model_combo
Any attempt in calling .fit/.predict results in ValueError: Found input variables with inconsistent numbers of samples:
model_fitted=model_combo.fit(x_train,y_train)
x_train contains Var1 and Var2
How to fit model_combo?
The problem is that sklearn text preprocessors (TfidfVectorizer in this case) operate on one-dimensional data, not two-dimensional as most other preprocessors. So the vectorizer treats its input as an iterable of its columns, so there's only one "document". This can be fixed in the ColumnTransformer by specifying the column to operate on not in a list:
est2_pre = ColumnTransformer([('BOW', TfidfVectorizer(ngram_range=(1, 3),max_features=1000),'Var2')],remainder='drop')

Output of BatchNorm1d in PyTorch does not match output of manually normalizing input dimensions

In an attempt to understand how BatchNorm1d works in PyTorch, I tried to match the output of BatchNorm1d operation on a 2D tensor with manually normalizing it. The manual output seems to be scaled down by a factor of 0.9747. Here's the code (note that affine is set to false):
import torch
import torch.nn as nn
from torch.autograd import Variable
X = torch.randn(20,100) * 5 + 10
X = Variable(X)
B = nn.BatchNorm1d(100, affine=False)
y = B(X)
mu = torch.mean(X[:,1])
var_ = torch.var(X[:,1])
sigma = torch.sqrt(var_ + 1e-5)
x = (X[:,1] - mu)/sigma
#the ration below should be equal to one
print(x.data / y[:,1].data )
Output is:
0.9747
0.9747
0.9747
....
Doing the same thing for BatchNorm2d works without any issues. How does BatchNorm1d calculate its output?
Found out the reason. torch.var uses Bessel's correction while calculating variance. Passing the attribute unbiased=False gives identical values.

DL4J Prediction Formatting

I have two questions on deeplearning4j that are somewhat related.
When I execute “INDArray predicted = model.output(features,false);” to generate a prediction, I get the label predicted by the model; it is either 0 or 1. I tried to search for a way to have a probability (value between 0 and 1) instead of strictly 0 or 1. This is useful when you need to set a threshold for what your model should consider as a 0 and what it should consider as a 1. For example, you may want your model to output '1' for any prediction that is higher than or equal to 0.9 and output '0' otherwise.
My second question is that I am not sure why the output is represented as a two-dimensional array (shown after the code below) even though there are only two possibilities, so it would be better to represent it with one value - especially if we want it as a probability (question #1) which is one value.
PS: in case relevant to the question, in the Schema the output column is defined using ".addColumnInteger". Below are snippets of the code used.
Part of the code:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(seed)
.iterations(1)
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
.learningRate(learningRate)
.updater(org.deeplearning4j.nn.conf.Updater.NESTEROVS).momentum(0.9)
.list()
.layer(0, new DenseLayer.Builder()
.nIn(numInputs)
.nOut(numHiddenNodes)
.weightInit(WeightInit.XAVIER)
.activation("relu")
.build())
.layer(1, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
.weightInit(WeightInit.XAVIER)
.activation("softmax")
.weightInit(WeightInit.XAVIER)
.nIn(numHiddenNodes)
.nOut(numOutputs)
.build()
)
.pretrain(false).backprop(true).build();
MultiLayerNetwork model = new MultiLayerNetwork(conf);
model.init();
model.setListeners(new ScoreIterationListener(10));
for (int n=0; n<nEpochs; n++) {
model.fit(trainIter);
}
Evaluation eval = new Evaluation(numOutputs);
while (testIter.hasNext()){
DataSet t = testIter.next();
INDArray features = t.getFeatureMatrix();
System.out.println("Input features: " + features);
INDArray labels = t.getLabels();
INDArray predicted = model.output(features,false);
System.out.println("Predicted output: "+ predicted);
System.out.println("Desired output: "+ labels);
eval.eval(labels, predicted);
System.out.println();
}
System.out.println(eval.stats());
Output from running the code above:
Input features: [0.10, 0.34, 1.00, 0.00, 1.00]
Predicted output: [1.00, 0.00]
Desired output: [1.00, 0.00]
*What I want the output to look like (i.e. a one-value probability):**
Input features: [0.10, 0.34, 1.00, 0.00, 1.00]
Predicted output: 0.14
Desired output: 0.0
I will answer your questions inline but I just want to note:
I would suggest taking a look at our docs and examples:
https://github.com/deeplearning4j/dl4j-examples
http://deeplearning4j.org/quickstart
A 100% 0 or 1 is just a badly tuned neural net. That's not at all how things work. A softmax by default returns probabilities. Your neural net is just badly tuned. Look at updating dl4j too. I'm not sure what version you're on but we haven't used strings in activations for at least a year now? You seem to have skipped a lot of steps when starting with us. I'll reiterate again, at least take a look above for a starting point rather than using year old code.
What you're seeing there is just standard deep learning 101. So the advice I'm about to give you can be found on the internet and is applicable for any deep learning software. A two label softmax sums each row to 1. If you want 1 label, use sigmoid with 1 output and a different loss function. We use softmax because it can work for any number of ouputs and all you have to do is change the number of outputs rather than having to change the loss function and activation function on top of that.

Torch tensor set the negative numbers to zero

x=torch.Tensor({1,-1,3,-8})
How to convert x such that all the negative values in x are replaced with zero without using a loop such that the tensor must look like
th>x
1
0
3
0
Pytorch supports indexing by operators
a = torch.Tensor([1,0,-1])
a[a < 0] = 0
a
tensor([1., 0., 0.])
Actually, this operation is equivalent to applying ReLU non-linear activation.
Just do this and you're good to go
output = torch.nn.functional.relu(a)
You can also do it in-place for faster computations:
torch.nn.functional.relu(a, inplace=True)
Pytorch takes care of broadcasting here :
x = torch.max(x,torch.tensor([0.]))

Add Bias to a Matrix in Torch

In Torch, how do I add a bias vector to each input to when i have a batch input? Suppose I have an input 3*2 matrix (where 2 = number of classes)
A
0.8191 0.2630
0.5344 0.4537
0.7380 0.5885
and I want to add the bias value to each element in the output matrix:
BIAS:
0.6588 0.6525
My final output should look like:
1.4779 0.9155
1.1931 1.1063
1.3967 1.2410
I am new to Torch and just figuring out the Syntax.
You can expand the bias to have the same dimensions as your input:
expandedBias=torch.expand(BIAS,3,2)
yields:
th> expandedBias
0.6588 0.6525
0.6588 0.6525
0.6588 0.6525
After that you can simply add them:
output=A+expandedBias
to give:
th> A+expandedBias
1.4779 0.9155
1.1931 1.1063
1.3967 1.2410
If you are using one of the latest versions of the torch, you do not even need to expand the bias.
you can directly write.
output = A + bias
The bias matrix will be broadcasted automatically. Check the documentation for the details of broadcasting.
https://pytorch.org/docs/stable/notes/broadcasting.html

Resources