Understanding Softmax Probability - machine-learning

Given the following Softmax regression model, what is the probability of this instance belonging to the class of Bare Soil?

If I understand your questions correctly, this is how you need to calculate the probability:
import numpy as np
d = {"vegetation": 0.3, "bare soil": 0.1, "impervious": 0.1}
P_baresoil = np.exp(d["bare soil"]) / np.sum(np.exp(list(d.values())))
P_vegetation = np.exp(d["vegetation"]) / np.sum(np.exp(list(d.values())))
P_impervious = np.exp(d["impervious"]) / np.sum(np.exp(list(d.values())))
P_baresoil = 0.31

Related

sklearn GP return std dev is zero for predictions where it must be large

I am trying regression using Gaussian processes sklearn package. The standard deviation on predictions are zero, where it must be larger.
kernel = ConstantKernel() + 1.0 * DotProduct() ** 0.3 + 1.0 * WhiteKernel()
gpr = GaussianProcessRegressor(
kernel=kernel,
alpha=0.3,
normalize_y=True,
random_state=123,
n_restarts_optimizer=0
)
gpr.fit(X_train, y_train)
Here I have shown the samples from posterior after training the model. It clearly shows the standard deviation increases along with x-axis.
This is the output I got. As the value increases along x-axis the stddev must increase, where as it is showing zero stddev.
Acutal results should look something like this.
Is it a bug ?
Full Code to reproduce the issue.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import ConstantKernel, WhiteKernel, DotProduct
df = pd.read_csv('train.csv')
X_train = df[:,0].to_numpy().reshape(-1,1)
y_train = df[:,1].to_numpy()
X_pred = np.linspace(0.01, 8.5, 1000).reshape(-1,1)
# Instantiate a Gaussian Process model
kernel = ConstantKernel() + 1.0 * DotProduct() ** 0.3 + 1.0 * WhiteKernel()
gpr = GaussianProcessRegressor(
kernel=kernel,
alpha=0.3,
normalize_y=True,
random_state=123,
n_restarts_optimizer=0
)
gpr.fit(X_train, y_train)
print(
f"Kernel parameters before fit:\n{kernel} \n"
f"Kernel parameters after fit: \n{gpr.kernel_} \n"
f"Log-likelihood: {gpr.log_marginal_likelihood(gpr.kernel_.theta):.3f} \n"
f"Score = {gpr.score(X_train,y_train)}"
)
n_samples = 10
y_samples = gpr.sample_y(X_pred, n_samples)
for idx, single_prior in enumerate(y_samples.T):
plt.plot(
X_pred,
single_prior,
linestyle="--",
alpha=0.7,
label=f"Sampled function #{idx + 1}",
)
plt.title('Sample from posterior distribution')
plt.show()
y_pred, sigma = gpr.predict(X_pred, return_std=True)
plt.figure(figsize=(10,6))
plt.plot(X_train, y_train, 'r.', markersize=3, label='Observations')
plt.plot(X_pred, y_pred, 'b-', label='Prediction',)
plt.fill_between(X_pred[:,0], y_pred-1*sigma, y_pred+1*sigma,
alpha=.4, fc='b', ec='None', label='68% confidence interval')
plt.fill_between(X_pred[:,0], y_pred-2*sigma, y_pred+2*sigma,
alpha=.3, fc='b', ec='None', label='95% confidence interval')
plt.fill_between(X_pred[:,0], y_pred-3*sigma, y_pred+3*sigma,
alpha=.1, fc='b', ec='None', label='99% confidence interval')
plt.legend()
plt.show()
Not really an answer but something to look out for that maybe it might help. I was having the same problem and had some results when changing the alpha, some kernel parameters or normalizing the data.
Probably it was due to a matter of scale (with big numbers, the std dev is too small in proportion)

PyTorch: CrossEntropyLoss, changing class weight does not change the computed loss

According to Doc for cross entropy loss, the weighted loss is calculated by multiplying the weight for each class and the original loss.
However, in the pytorch implementation, the class weight seems to have no effect on the final loss value unless it is set to zero. Following is the code:
from torch import nn
import torch
logits = torch.FloatTensor([
[0.1, 0.9],
])
label = torch.LongTensor([0])
criterion = nn.CrossEntropyLoss(weight=torch.FloatTensor([1, 1]))
loss = criterion(logits, label)
print(loss.item()) # result: 1.1711
# Change class weight for the first class to 0.1
criterion = nn.CrossEntropyLoss(weight=torch.FloatTensor([0.1, 1]))
loss = criterion(logits, label)
print(loss.item()) # result: 1.1711, should be 0.11711
# Change weight for first class to 0
criterion = nn.CrossEntropyLoss(weight=torch.FloatTensor([0, 1]))
loss = criterion(logits, label)
print(loss.item()) # result: 0
As illustrated in the code, the class weight seems to have no effect unless it is set to 0, this behavior contradicts to the documentation.
Updates
I implemented a version of weighted cross entropy which is in my eyes the "correct" way to do it.
import torch
from torch import nn
def weighted_cross_entropy(logits, label, weight=None):
assert len(logits.size()) == 2
batch_size, label_num = logits.size()
assert (batch_size == label.size(0))
if weight is None:
weight = torch.ones(label_num).float()
assert (label_num == weight.size(0))
x_terms = -torch.gather(logits, 1, label.unsqueeze(1)).squeeze()
log_terms = torch.log(torch.sum(torch.exp(logits), dim=1))
weights = torch.gather(weight, 0, label).float()
return torch.mean((x_terms+log_terms)*weights)
logits = torch.FloatTensor([
[0.1, 0.9],
[0.0, 0.1],
])
label = torch.LongTensor([0, 1])
neg_weight = 0.1
weight = torch.FloatTensor([neg_weight, 1])
criterion = nn.CrossEntropyLoss(weight=weight)
loss = criterion(logits, label)
print(loss.item()) # results: 0.69227
print(weighted_cross_entropy(logits, label, weight).item()) # results: 0.38075
What I did is to multiply each instance in the batch with its associated class weight. The result is still different from the original pytorch implementation, which makes me wonder how pytorch actually implement this.

Tensorflow and Scikit learn: Same solution but different outputs

Im implementing a simple linear regression with scikitlearn and tensorflow.
My solution in scikitlearn seem fine but with tensorflow my evaluation output is showing some crazy numbers.
The problem is basically to try to predict a salary based in years of experience.
I not sure what Im doing wrong in Tensorflow's code.
Thanks!
ScikitLearn solution
import pandas as pd
data = pd.read_csv('Salary_Data.csv')
X = data.iloc[:, :-1].values
y = data.iloc[:, 1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
X_single_data = [[4.6]]
y_single_pred = regressor.predict(X_single_data)
print(f'Train score: {regressor.score(X_train, y_train)}')
print(f'Test score: {regressor.score(X_test, y_test)}')
Train score: 0.960775692121653
Test score: 0.9248580247217076
Tensorflow solution
import tensorflow as tf
f_cols = [tf.feature_column.numeric_column(key='X', shape=[1])]
estimator = tf.estimator.LinearRegressor(feature_columns=f_cols)
train_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_train}, y=y_train,shuffle=False)
test_input_fn = tf.estimator.inputs.numpy_input_fn(x={'X': X_test}, y=y_test,shuffle=False)
train_spec = tf.estimator.TrainSpec(input_fn=train_input_fn)
eval_spec = tf.estimator.EvalSpec(input_fn=test_input_fn)
tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec)
({'average_loss': 7675087400.0,
'label/mean': 84588.11,
'loss': 69075790000.0,
'prediction/mean': 5.0796494,
'global_step': 6},
[])
Data
YearsExperience,Salary
1.1,39343.00
1.3,46205.00
1.5,37731.00
2.0,43525.00
2.2,39891.00
2.9,56642.00
3.0,60150.00
3.2,54445.00
3.2,64445.00
3.7,57189.00
3.9,63218.00
4.0,55794.00
4.0,56957.00
4.1,57081.00
4.5,61111.00
4.9,67938.00
5.1,66029.00
5.3,83088.00
5.9,81363.00
6.0,93940.00
6.8,91738.00
7.1,98273.00
7.9,101302.00
8.2,113812.00
8.7,109431.00
9.0,105582.00
9.5,116969.00
9.6,112635.00
10.3,122391.00
10.5,121872.00
Per your code request in the comments: Though I had used my online curve and surface fitting web site zunzun.com for this equation at http://zunzun.com/Equation/2/Sigmoidal/Sigmoid%20B/ for the modeling work, here is a graphing source code example using the scipy differential_evolution genetic algorithm module to estimate initial parameter estimates. The scipy implementation of Differential Evolution uses the Latin Hypercube algorithm to ensure a thorough search of parameter space, which requires bounds within which to search - in this example those bounds are taken from the data maximum and minimum values, and the fit statistics and parameter values are almost identical to those from the web site.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
from scipy.optimize import differential_evolution
import warnings
xData = numpy.array([ 1.1, 1.3, 1.5, 2.0, 2.2, 2.9, 3.0, 3.2, 3.2, 3.7, 3.9, 4.0, 4.0, 4.1, 4.5, 4.9, 5.1, 5.3, 5.9, 6.0, 6.8, 7.1, 7.9, 8.2, 8.7, 9.0, 9.5, 9.6, 10.3, 10.5])
yData = numpy.array([ 39.343, 46.205, 37.731, 43.525, 39.891, 56.642, 60.15, 54.445, 64.445, 57.189, 63.218, 55.794, 56.957, 57.081, 61.111, 67.938, 66.029, 83.088, 81.363, 93.94, 91.738, 98.273, 101.302, 113.812, 109.431, 105.582, 116.969, 112.635, 122.391, 121.872])
def func(x, a, b, c):
return a / (1.0 + numpy.exp(-(x-b)/c))
# function for genetic algorithm to minimize (sum of squared error)
def sumOfSquaredError(parameterTuple):
warnings.filterwarnings("ignore") # do not print warnings by genetic algorithm
val = func(xData, *parameterTuple)
return numpy.sum((yData - val) ** 2.0)
def generate_Initial_Parameters():
# min and max used for bounds
maxX = max(xData)
minX = min(xData)
maxY = max(yData)
minY = min(yData)
parameterBounds = []
parameterBounds.append([minY, maxY]) # search bounds for a
parameterBounds.append([minX, maxX]) # search bounds for b
parameterBounds.append([minX, maxX]) # search bounds for c
# "seed" the numpy random number generator for repeatable results
result = differential_evolution(sumOfSquaredError, parameterBounds, seed=3)
return result.x
# by default, differential_evolution completes by calling curve_fit() using parameter bounds
geneticParameters = generate_Initial_Parameters()
# now call curve_fit without passing bounds from the genetic algorithm,
# just in case the best fit parameters are aoutside those bounds
fittedParameters, pcov = curve_fit(func, xData, yData, geneticParameters)
print('Fitted parameters:', fittedParameters)
print()
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print()
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('Years of experience') # X axis data label
axes.set_ylabel('Salary in thousands') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)
I cannot place an image in a comment, and so place it here. I suspected the relationship might be sigmoidal rather than linear, and found the following sigmoidal equation and fit statistics using units of thousands for salary: "y = a / (1.0 + exp(-(x-b)/c))" with fitted parameters a = 1.5535069418318591E+02, b = 5.4580059234664899E+00, and c = 3.7724942500630938E+00 giving an R-squared = 0.96 and RMSE = 5.30 (thousand)

Keras unrealistic results

I try to do a creditcard-fraud prediction with keras.
For that, I have a creditcard.csv file, with over 280 000 different cases which are all labeled as fraud or valid.
My problem is, that my code actually does compile, but in the first epoche, my accuracy is already 0.9979 and from the second epoche on acc: 0.9982.
That doesn't seem to be very realistic to me, but I don't know my mistake.
Here is the shortened version of my code:
import pandas as pd
import numpy as np
from keras import models
from keras import layers
combinedData = pd.read_csv('creditcard.csv')
trainData = combinedData[:227845]
testData = combinedData[227845:]
trainDataFactors = trainData.copy()
del trainDataFactors['Class']
trainDataLabels = pd.DataFrame(trainData, columns=['Class'])
testDataFactors = testData.copy()
del testDataFactors['Class']
testDataLabels = pd.DataFrame(testData, columns=['Class'])
model = models.Sequential()
model.add(layers.Dense(30, activation="relu", input_shape = (30, )))
model.add(layers.Dense(60, activation ="relu"))
model.add(layers.Dense(30, activation="sigmoid"))
model.compile(
optimizer = "rmsprop",
loss = "sparse_categorical_crossentropy",
metrics = ["accuracy"]
)
history = model.fit(
trainDataFactors, trainDataLabels,
epochs = 20,
batch_size = 512,
validation_data=(testDataFactors, testDataLabels)
)
I appreciate any help!
Is your test data balanced?
Because if not, e.g. it's collection of real data, I'd guess that a degenerate model replying "valid" to any input could easily get > 99 % acc. Try reporting also F1 score, that's the default choice for (unbalaced) detection tasks.

RNN can't learn integral function

For studying deep learning, RNN, LSTM and so on I tried to make RNN fit integration function. I have put random signal from 0 to 1 as input to RNN and made integral from biased by -0.5 input signal, made the limit for integral between 0:1 and put it as RNN target to learn. Blue - random input, orange - integrated input
So I have time series with only one input (random) and one output (limited integral of input) and I want RNN to predict output by the input.
I used Pytorch and tried to use vanilla RNN, GRU cell, different sizes of hidden layers, stacking several RNN, putting dense connected layers to the RNN output, different deep in backpropagation through time (from 2 to 50 gradients rolling-back). And I can't get a good result at all! It works somehow, but I can't find a way to fit integral function precisely. Here is the best of my results:
green - RNN output. Green line (model output) does not fit orange line in many cases - that is the problem.
Here is my source code in jupyter.
My questions: is it possible - to learn a saturated integral function by RNN? Where is my problem? What can I try more to achieve good quality? Ideally I want to RNN output be equal desired output (integral function) through all time series.
PS:
My code in raw format:
import numpy as np
from scipy.stats import truncnorm
import random
import math
import copy
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import matplotlib.cm as cm
def generate_data(num_of_data):
input_data=[]
output_data=[]
current_input_value=0
current_output_value=0
for i in range(num_of_data):
if (random.random()<0.1):
current_input_value=random.random()
# current_output_value=0
current_input_value=current_input_value+(random.random()-0.5)*0
current_output_value=current_output_value+0.0*(current_input_value-current_output_value)+(current_input_value-0.5)*0.1
if (current_output_value<0):
current_output_value=0
if (current_output_value>1):
current_output_value=1
input_data.append(current_input_value)
output_data.append(current_output_value)
return input_data,output_data
%matplotlib inline
matplotlib.rcParams['figure.figsize'] = (20, 6)
input_data,output_data=generate_data(500)
plt.plot(input_data)
plt.plot(output_data)
plt.show()
import torch
import torch.nn as nn
from torch.autograd import Variable
from torch import optim
class RNN(nn.Module):
def __init__(self, input_size, hidden_size, output_size):
super(RNN, self).__init__()
self.number_of_layers=1
self.hidden_size = hidden_size
self.gru = nn.GRU(input_size, hidden_size,self.number_of_layers)
self.Dense1 = nn.Linear(hidden_size, hidden_size)
self.Dense1A = nn.ReLU()
self.Dense2 = nn.Linear(hidden_size, output_size)
def forward(self, input, hidden):
gru_output, hidden = self.gru(input, hidden)
Dense1Out=self.Dense1(gru_output)
Dense1OutAct=self.Dense1A(Dense1Out)
output=self.Dense2(Dense1OutAct)
return output, hidden
def initHidden(self):
return Variable(torch.zeros(self.number_of_layers,1,self.hidden_size))
import time
import math
import operator
def timeSince(since):
now = time.time()
s = now - since
m = math.floor(s / 60)
s -= m * 60
return '%dm %ds' % (m, s)
rnn = RNN(1, 50, 1)
n_iters = 250000
print_every = 2000
plot_every = 2000
all_losses = []
total_loss_print = 0
total_loss_plot = 0
criterion=nn.L1Loss()
print("training...\n")
start = time.time()
optimizer = optim.Adam(rnn.parameters(), lr=0.0002)
rnn_hidden = rnn.initHidden()
rnn.zero_grad()
loss = 0
#for gata_q in range(int(n_iters/500)):
# rnn_hidden = rnn.initHidden()
input_data,output_data=generate_data(n_iters)
for data_index in range(len(input_data)):
input_tensor=torch.zeros(1, 1, 1)
input_tensor[0][0][0]=input_data[data_index]
output_tensor=torch.zeros(1, 1, 1)
output_tensor[0][0][0]=output_data[data_index]
rnn_output, rnn_hidden = rnn(Variable(input_tensor), rnn_hidden)
loss += criterion(rnn_output, Variable(output_tensor))
if data_index%2==0:
loss.backward()
total_loss_print += loss.data[0]
total_loss_plot += loss.data[0]
optimizer.step()
rnn_hidden=Variable(rnn_hidden.data)
rnn.zero_grad()
loss = 0
if data_index % print_every == 0:
print('%s (%d %d%%) tl=%.4f' % (timeSince(start), data_index, data_index / n_iters * 100,total_loss_print/print_every))
total_loss_print = 0
if data_index % plot_every == 0:
all_losses.append(total_loss_plot / plot_every)
total_loss_plot = 0
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
plt.figure()
plt.plot(all_losses)
plt.show()
rnn_hidden = rnn.initHidden()
rnn.zero_grad()
loss = 0
rnn_output_data=[]
input_data,output_data=generate_data(1500)
for data_index in range(len(input_data)):
input_tensor=torch.zeros(1, 1, 1)
input_tensor[0][0][0]=input_data[data_index]
rnn_output, rnn_hidden = rnn(Variable(input_tensor), rnn_hidden)
rnn_output_data.append(rnn_output.data.numpy()[0][0][0])
plt.plot(input_data)#blue
plt.plot(output_data)#ogange
plt.plot(rnn_output_data)#green
plt.show()
I have found the problem by myself. The problem was in some case of overfitting on latest data, as in reinforcement learning case overfitting can occur with exploiting latest strategy. As I was not using any mini-batches and applied optimiser directly after a new point of data, and as because of data points similar through 20-50 of samples, optimiser simply fitted network to only latest points forgetting of fitting previous. I solved it by collecting gradient data through time for 50 points and only after it I apply one step of optimiser. The network can learn now much better, but still not perfect.
Here is modification of code to make it work:
rnn_output, rnn_hidden = rnn(Variable(input_tensor), rnn_hidden)
loss += criterion(rnn_output, Variable(output_tensor))
if data_index % 2==0:
loss.backward()
total_loss_print += loss.data[0]
rnn_hidden=Variable(rnn_hidden.data)
loss = 0
# torch.nn.utils.clip_grad_norm(rnn.parameters(), 0.01)
if data_index % 50==0:
optimizer.step()
rnn.zero_grad()
and new result of learning of integral:
pic.

Resources