Getting nans after applying softmax - machine-learning

I am trying to develop a deep Markov Model using following tutorial:
https://pyro.ai/examples/dmm.html
This model parameterises transitions and emissions with using a neural network and for variational inference part they use RNN to map observable 'x' to latent space. And in order to ensure that their model is learning something they try to maximise ELBO or minimise negative ELBO. They refer to negative ELBO as NLL.
I am basically converting pyro code to pure pytorch. And I've put together pretty much every thing. However, now I want to get a reconstruct error on test sequences. My input is one hot encoding of my sequences and it is of shape (500,1900,4) where 4 refers to number of features and 1900 is sequence length, and 500 refers to total number of examples.
The way I am generating data is:
generation model p(x_{1:T} | z_{1:T}) p(z_{1:T})
batch_size, _, x_dim = x.size() # number of time steps we need to process in the mini-batch
T_max = x_lens.max()
z_prev = self.z_0.expand(batch_size, self.z_0.size(0)) # set z_prev=z_0 to setup the recursive conditioning in p(z_t|z_{t-1})
for t in range(1, T_max + 1):
# sample z_t ~ p(z_t | z_{t-1}) one time step at a time
z_t, z_mu, z_logvar = self.trans(z_prev) # p(z_t | z_{t-1})
p_x_t = F.softmax(self.emitter(z_t),dim=-1) # compute the probabilities that parameterize the bernoulli likelihood
safe_tensor = torch.where(torch.isnan(p_x_t), torch.zeros_like(p_x_t), p_x_t)
print('generate p_x_t : ',safe_tensor)
x_t = torch.bernoulli(safe_tensor) #sample observe x_t according to the bernoulli distribution p(x_t|z_t)
print('generate x_t : ',x_t)
z_prev = z_t
So my emissions are being defined by Bernoulli distribution. And I use softmax in order to compute probabilities that parameterise the Bernoulli likelihood. And then I sample my observables x_t according the Bernoulli distribution.
At first when I ran my model, I was getting nans at times and so I introduced the line (show below) in order to convert nans to zero:
safe_tensor = torch.where(torch.isnan(p_x_t), torch.zeros_like(p_x_t), p_x_t)
However, after 1 epoch or so 'x_t' that I sample, this tensor is just zero. Basically, what I want is that after applying softmax, I want my function to pick the highest probability and give me the corresponding label for it which is either of the 4 features. But I get nans and then when I convert all nans to zero, I end up getting all zeros in all tensors after 1 epoch or so.
Also, when I look at p_x_t tensor I get probabilities that add up to one. But when I look at x_t tensor it gives me 0's all the way. So for example:
p_x_t : tensor([[0.2168, 0.2309, 0.2555, 0.2967]
.....], device='cuda:0',
grad_fn=<SWhereBackward>)
generate x_t : tensor([[0., 0., 0., 0.],...
], device='cuda:0', grad_fn=<BernoulliBackward0>)
Here fourth label/feature is giving me highest probability. Shouldn't the x_t tensor give me 1 at least in that position so be like:
generate x_t : tensor([[0., 0., 0., 1.],...
], device='cuda:0', grad_fn=<BernoulliBackward0>)
How can I get rid of this problems?
EDIT
My transition (which is called as self.trans in generate function mentioned above):
class GatedTransition(nn.Module):
"""
Parameterizes the gaussian latent transition probability `p(z_t | z_{t-1})`
See section 5 in the reference for comparison.
"""
def __init__(self, z_dim, trans_dim):
super(GatedTransition, self).__init__()
self.gate = nn.Sequential(
nn.Linear(z_dim, trans_dim),
nn.ReLU(),
nn.Linear(trans_dim, z_dim),
nn.Softmax(dim=-1)
)
self.proposed_mean = nn.Sequential(
nn.Linear(z_dim, trans_dim),
nn.ReLU(),
nn.Linear(trans_dim, z_dim)
)
self.z_to_mu = nn.Linear(z_dim, z_dim)
# modify the default initialization of z_to_mu so that it starts out as the identity function
self.z_to_mu.weight.data = torch.eye(z_dim)
self.z_to_mu.bias.data = torch.zeros(z_dim)
self.z_to_logvar = nn.Linear(z_dim, z_dim)
self.relu = nn.ReLU()
def forward(self, z_t_1):
"""
Given the latent `z_{t-1}` corresponding to the time step t-1
we return the mean and scale vectors that parameterize the (diagonal) gaussian distribution `p(z_t | z_{t-1})`
"""
gate = self.gate(z_t_1) # compute the gating function
proposed_mean = self.proposed_mean(z_t_1) # compute the 'proposed mean'
mu = (1 - gate) * self.z_to_mu(z_t_1) + gate * proposed_mean # compute the scale used to sample z_t, using the proposed mean from
logvar = self.z_to_logvar(self.relu(proposed_mean))
epsilon = torch.randn(z_t_1.size(), device=z_t_1.device) # sampling z by re-parameterization
z_t = mu + epsilon * torch.exp(0.5 * logvar) # [batch_sz x z_sz]
if torch.isinf(z_t).any().item():
print('something is infinity')
print('z_t : ',z_t)
print('logvar : ',logvar)
print('epsilon : ',epsilon)
print('mu : ',mu)
return z_t, mu, logvar
I do not get any inf in z_t tensor when doing training and validation. Only during testing. This is how I am training, validating and testing my model:
for epoch in range(config['epochs']):
train_loader=torch.utils.data.DataLoader(dataset=train_set, batch_size=config['batch_size'], shuffle=True, num_workers=1)
train_data_iter=iter(train_loader)
n_iters=train_data_iter.__len__()
epoch_nll = 0.0 # accumulator for our estimate of the negative log likelihood (or rather -elbo) for this epoch
i_batch=1
n_slices=0
loss_records={}
while True:
try: x, x_rev, x_lens = train_data_iter.next()
except StopIteration: break # end of epoch
x, x_rev, x_lens = gVar(x), gVar(x_rev), gVar(x_lens)
if config['anneal_epochs'] > 0 and epoch < config['anneal_epochs']: # compute the KL annealing factor
min_af = config['min_anneal']
kl_anneal = min_af+(1.0-min_af)*(float(i_batch+epoch*n_iters+1)/float(config['anneal_epochs']*n_iters))
else:
kl_anneal = 1.0 # by default the KL annealing factor is unity
loss_AE = model.train_AE(x, x_rev, x_lens, kl_anneal)
epoch_nll += loss_AE['train_loss_AE']
i_batch=i_batch+1
n_slices=n_slices+x_lens.sum().item()
loss_records.update(loss_AE)
loss_records.update({'epo_nll':epoch_nll/n_slices})
times.append(time.time())
epoch_time = times[-1] - times[-2]
log("[Epoch %04d]\t\t(dt = %.3f sec)"%(epoch, epoch_time))
log(loss_records)
if args.visual:
for k, v in loss_records.items():
tb_writer.add_scalar(k, v, epoch)
# do evaluation on test and validation data and report results
if (epoch+1) % args.test_freq == 0:
save_model(model, epoch)
#test_loader=torch.utils.data.DataLoader(dataset=test_set, batch_size=config['batch_size'], shuffle=False, num_workers=1)
valid_loader=torch.utils.data.DataLoader(dataset=valid_set, batch_size=config['batch_size'], shuffle=False, num_workers=1)
for x, x_rev, x_lens in valid_loader:
x, x_rev, x_lens = gVar(x), gVar(x_rev), gVar(x_lens)
loss,kl_loss = model.valid(x, x_rev, x_lens)
#print('x_lens sum : ',x_lens.sum().item())
#print('loss : ',loss)
valid_nll = loss/x_lens.sum().item()
log("[Epoch %04d]\t\t(valid_loss = %.8f)\t\t(kl_loss = %.8f)\t\t(valid_nll = %.8f)"%(epoch, loss, kl_loss, valid_nll ))
test_loader=torch.utils.data.DataLoader(dataset=test_set, batch_size=config['batch_size'], shuffle=False, num_workers=1)
for x, x_rev, x_lens in test_loader:
x, x_rev, x_lens = gVar(x), gVar(x_rev), gVar(x_lens)
loss,kl_loss = model.valid(x, x_rev, x_lens)
test_nll = loss/x_lens.sum().item()
model.generate(x, x_rev, x_lens)
log("[test_nll epoch %08d] %.8f" % (epoch, test_nll))
The test is outside the for loop for epoch. Because I want to test my model on test sequences using generate function. Can you now help me understand if there's something wrong I am doing?

Related

Pytorch model stuck at 0.5 though loss decreases consistently

This is using PyTorch
I have been trying to implement UNet model on my images, however, my model accuracy is always exact 0.5. Loss does decrease.
I have also checked for class imbalance. I have also tried playing with learning rate. Learning rate affects loss but not the accuracy.
My architecture below ( from here )
""" `UNet` class is based on https://arxiv.org/abs/1505.04597
The U-Net is a convolutional encoder-decoder neural network.
Contextual spatial information (from the decoding,
expansive pathway) about an input tensor is merged with
information representing the localization of details
(from the encoding, compressive pathway).
Modifications to the original paper:
(1) padding is used in 3x3 convolutions to prevent loss
of border pixels
(2) merging outputs does not require cropping due to (1)
(3) residual connections can be used by specifying
UNet(merge_mode='add')
(4) if non-parametric upsampling is used in the decoder
pathway (specified by upmode='upsample'), then an
additional 1x1 2d convolution occurs after upsampling
to reduce channel dimensionality by a factor of 2.
This channel halving happens with the convolution in
the tranpose convolution (specified by upmode='transpose')
Arguments:
in_channels: int, number of channels in the input tensor.
Default is 3 for RGB images. Our SPARCS dataset is 13 channel.
depth: int, number of MaxPools in the U-Net. During training, input size needs to be
(depth-1) times divisible by 2
start_filts: int, number of convolutional filters for the first conv.
up_mode: string, type of upconvolution. Choices: 'transpose' for transpose convolution
"""
class UNet(nn.Module):
def __init__(self, num_classes, depth, in_channels, start_filts=16, up_mode='transpose', merge_mode='concat'):
super(UNet, self).__init__()
if up_mode in ('transpose', 'upsample'):
self.up_mode = up_mode
else:
raise ValueError("\"{}\" is not a valid mode for upsampling. Only \"transpose\" and \"upsample\" are allowed.".format(up_mode))
if merge_mode in ('concat', 'add'):
self.merge_mode = merge_mode
else:
raise ValueError("\"{}\" is not a valid mode for merging up and down paths.Only \"concat\" and \"add\" are allowed.".format(up_mode))
# NOTE: up_mode 'upsample' is incompatible with merge_mode 'add'
if self.up_mode == 'upsample' and self.merge_mode == 'add':
raise ValueError("up_mode \"upsample\" is incompatible with merge_mode \"add\" at the moment "
"because it doesn't make sense to use nearest neighbour to reduce depth channels (by half).")
self.num_classes = num_classes
self.in_channels = in_channels
self.start_filts = start_filts
self.depth = depth
self.down_convs = []
self.up_convs = []
# create the encoder pathway and add to a list
for i in range(depth):
ins = self.in_channels if i == 0 else outs
outs = self.start_filts*(2**i)
pooling = True if i < depth-1 else False
down_conv = DownConv(ins, outs, pooling=pooling)
self.down_convs.append(down_conv)
# create the decoder pathway and add to a list
# - careful! decoding only requires depth-1 blocks
for i in range(depth-1):
ins = outs
outs = ins // 2
up_conv = UpConv(ins, outs, up_mode=up_mode, merge_mode=merge_mode)
self.up_convs.append(up_conv)
self.conv_final = conv1x1(outs, self.num_classes)
# add the list of modules to current module
self.down_convs = nn.ModuleList(self.down_convs)
self.up_convs = nn.ModuleList(self.up_convs)
self.reset_params()
#staticmethod
def weight_init(m):
if isinstance(m, nn.Conv2d):
#https://prateekvjoshi.com/2016/03/29/understanding-xavier-initialization-in-deep-neural-networks/
##Doc: https://pytorch.org/docs/stable/nn.init.html?highlight=xavier#torch.nn.init.xavier_normal_
init.xavier_normal_(m.weight)
init.constant_(m.bias, 0)
def reset_params(self):
for i, m in enumerate(self.modules()):
self.weight_init(m)
def forward(self, x):
encoder_outs = []
# encoder pathway, save outputs for merging
for i, module in enumerate(self.down_convs):
x, before_pool = module(x)
encoder_outs.append(before_pool)
for i, module in enumerate(self.up_convs):
before_pool = encoder_outs[-(i+2)]
x = module(before_pool, x)
# No softmax is used. This means we need to use
# nn.CrossEntropyLoss is your training script,
# as this module includes a softmax already.
x = self.conv_final(x)
return x
Parameters are :
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
x,y = train_sequence[0] ; batch_size = x.shape[0]
model = UNet(num_classes = 2, depth=5, in_channels=5, merge_mode='concat').to(device)
optim = torch.optim.Adam(model.parameters(),lr=0.01, weight_decay=1e-3)
criterion = nn.BCEWithLogitsLoss() #has sigmoid internally
epochs = 1000
The function for training is :
import torch.nn.functional as f
def train_model(epoch,train_sequence):
"""Train the model and report validation error with training error
Args:
model: the model to be trained
criterion: loss function
data_train (DataLoader): training dataset
"""
model.train()
for idx in range(len(train_sequence)):
X, y = train_sequence[idx]
images = Variable(torch.from_numpy(X)).to(device) # [batch, channel, H, W]
masks = Variable(torch.from_numpy(y)).to(device)
outputs = model(images)
print(masks.shape, outputs.shape)
loss = criterion(outputs, masks)
optim.zero_grad()
loss.backward()
# Update weights
optim.step()
# total_loss = get_loss_train(model, data_train, criterion)
My function for calculating loss and accuracy is below:
def get_loss_train(model, train_sequence):
"""
Calculate loss over train set
"""
model.eval()
total_acc = 0
total_loss = 0
for idx in range(len(train_sequence)):
with torch.no_grad():
X, y = train_sequence[idx]
images = Variable(torch.from_numpy(X)).to(device) # [batch, channel, H, W]
masks = Variable(torch.from_numpy(y)).to(device)
outputs = model(images)
loss = criterion(outputs, masks)
preds = torch.argmax(outputs, dim=1).float()
acc = accuracy_check_for_batch(masks.cpu(), preds.cpu(), images.size()[0])
total_acc = total_acc + acc
total_loss = total_loss + loss.cpu().item()
return total_acc/(len(train_sequence)), total_loss/(len(train_sequence))
Edit : Code which runs (calls) the functions:
for epoch in range(epochs):
train_model(epoch, train_sequence)
train_acc, train_loss = get_loss_train(model,train_sequence)
print("Train Acc:", train_acc)
print("Train loss:", train_loss)
Can someone help me identify as why is accuracy always exact 0.5?
Edit-2:
As asked accuracy_check_for_batch function is here:
def accuracy_check_for_batch(masks, predictions, batch_size):
total_acc = 0
for index in range(batch_size):
total_acc += accuracy_check(masks[index], predictions[index])
return total_acc/batch_size
and
def accuracy_check(mask, prediction):
ims = [mask, prediction]
np_ims = []
for item in ims:
if 'str' in str(type(item)):
item = np.array(Image.open(item))
elif 'PIL' in str(type(item)):
item = np.array(item)
elif 'torch' in str(type(item)):
item = item.numpy()
np_ims.append(item)
compare = np.equal(np_ims[0], np_ims[1])
accuracy = np.sum(compare)
return accuracy/len(np_ims[0].flatten())
I found the mistake.
model = UNet(num_classes = 2, depth=5, in_channels=5, merge_mode='concat').to(device)
should be
model = UNet(num_classes = 1, depth=5, in_channels=5, merge_mode='concat').to(device)
because I am using BCELosswithLogits.

How to feed previous time-stamp prediction as additional input to the next time-stamp?

This question might have been asked, but I got confused.
I am trying to apply one of RNN types, e.g. LSTM for time-series forecasting. I have inputs, y (stock returns). For each timestamp, I'd like to get the predictions. Q1 - Am I correct choosing seq2seq approach?
I also want to use predictions from previous timestamp (initializing initial values with some constant) as additional (still using my existing inputs) input in the form of squared residuals, i.e. using
eps_{t-1} = (y_{t-1} - y^_{t-1})^2 as additional input at t (as well as previous inputs).
So, how can I do this in tensorflow or in pytorch?
I tried to depict what I want on the attached graph. The graph
p.s. Sorry, it the question is poorly formulated
Let say your input if of dimension (32,10,1) with batch_size 32, time steps of length 10 and dimension of 1. Same for your target (stock return). This code make use of the tf.scan function, which is usefull when implementing custom recurrent networks (it will iterate over the timesteps). It remains to use the residual of t-1 in t somewhere, as you would like to.
ps: it is a very basic implementation of lstm from scratch, without any bias or output activation.
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
BS = 32
TS = 10
inputs_dim = 1
target_dim = 1
inputs = tf.placeholder(shape=[BS, TS, inputs_dim], dtype=tf.float32)
stock_returns = tf.placeholder(shape=[BS, TS, target_dim], dtype=tf.float32)
state_size = 16
# initial hidden state
init_state = tf.placeholder(shape=[2, BS, state_size],
dtype=tf.float32, name='initial_state')
# initializer
xav_init = tf.contrib.layers.xavier_initializer
# params
W = tf.get_variable('W', shape=[4, state_size, state_size],
initializer=xav_init())
U = tf.get_variable('U', shape=[4, inputs_dim, state_size],
initializer=xav_init())
W_out = tf.get_variable('W_out', shape=[state_size, target_dim],
initializer=xav_init())
#the function to feed tf.scan with
def step(prev, inputs_):
#unpack all inputs and previous outputs
st_1, ct_1 = prev[0][0], prev[0][1]
x = inputs_[0]
target = inputs_[1]
#get previous squared residual
eps = prev[1]
"""
here do whatever you want with eps_t-1
like x += eps if x if of the same dimension
or include it somewhere in your graph
"""
# lstm gates (add bias if needed)
#
# input gate
i = tf.sigmoid(tf.matmul(x,U[0]) + tf.matmul(st_1,W[0]))
# forget gate
f = tf.sigmoid(tf.matmul(x,U[1]) + tf.matmul(st_1,W[1]))
# output gate
o = tf.sigmoid(tf.matmul(x,U[2]) + tf.matmul(st_1,W[2]))
# gate weights
g = tf.tanh(tf.matmul(x,U[3]) + tf.matmul(st_1,W[3]))
ct = ct_1*f + g*i
st = tf.tanh(ct)*o
"""
make prediction, compute residual in t
and pass it to t+1
Normaly, we would compute prediction outside the scan function,
but as we need it here, we could just keep it and return it back
as an output of the scan function
"""
prediction_t = tf.matmul(st, W_out) # + bias
eps = (target - prediction_t)**2
return [tf.stack((st, ct), axis=0), eps, prediction_t]
states, eps, preds = tf.scan(step, [tf.transpose(inputs, [1,0,2]),
tf.transpose(stock_returns, [1,0,2])], initializer=[init_state,
tf.zeros((32,1), dtype=tf.float32),
tf.zeros((32,1),dtype=tf.float32)])
with tf.Session() as sess:
sess.run(tf.global_variables_initializer())
out = sess.run(preds, feed_dict=
{inputs:np.random.rand(BS,TS,inputs_dim),
stock_returns:np.random.rand(BS,TS,target_dim),
init_state:np.zeros((2,BS,state_size))})
out = tf.transpose(out,[1,0,2])
print(out)
And the output :
Tensor("transpose_2:0", shape=(32, 10, 1), dtype=float32)
Base code from here

Defining a simple neural netwok in mxnet error

I am doing making simple NN using MXnet , but having some problem in step() method
x1.shape=(64, 1, 1000)
y1.shape=(64, 1, 10)
net =nm.Sequential()
net.add(nn.Dense(H,activation='relu'),nn.Dense(90,activation='relu'),nn.Dense(D_out))
for t in range(500):
#y_pred = net(x1)
#loss = loss_fn(y_pred, y)
#for i in range(len(x1)):
with autograd.record():
output=net(x1)
loss =loss_fn(output,y1)
loss.backward()
trainer.step(64)
if t % 100 == 99:
print(t, loss)
#optimizer.zero_grad()
UserWarning: Gradient of Parameter dense30_weight on context cpu(0)
has not been updated by backward since last step. This could mean a
bug in your model that made it only use a subset of the Parameters
(Blocks) for this iteration. If you are intentionally only using a
subset, call step with ignore_stale_grad=True to suppress this warning
and skip updating of Parameters with stale gradient
The error indicates that you are passing parameters in your trainer that are not in your computational graph.
You need to initialize the parameters of your model and define the trainer. Unlike Pytorch, you don't need to call zero_grad in MXNet because by default new gradients are written in and not accumulated. Following code shows a simple neural network implemented using MXNet's Gluon API:
# Define model
net = gluon.nn.Dense(1)
net.collect_params().initialize(mx.init.Normal(sigma=1.), ctx=model_ctx)
square_loss = gluon.loss.L2Loss()
trainer = gluon.Trainer(net.collect_params(), 'sgd', {'learning_rate': 0.0001})
# Create random input and labels
def real_fn(X):
return 2 * X[:, 0] - 3.4 * X[:, 1] + 4.2
X = nd.random_normal(shape=(num_examples, num_inputs))
noise = 0.01 * nd.random_normal(shape=(num_examples,))
y = real_fn(X) + noise
# Define Dataloader
batch_size = 4
train_data = gluon.data.DataLoader(gluon.data.ArrayDataset(X, y), batch_size=batch_size, shuffle=True)
num_batches = num_examples / batch_size
for e in range(10):
# Iterate over training batches
for i, (data, label) in enumerate(train_data):
# Load data on the CPU
data = data.as_in_context(mx.cpu())
label = label.as_in_context(mx.cpu())
with autograd.record():
output = net(data)
loss = square_loss(output, label)
# Backpropagation
loss.backward()
trainer.step(batch_size)
cumulative_loss += nd.mean(loss).asscalar()
print("Epoch %s, loss: %s" % (e, cumulative_loss / num_examples))

Using softmax in Neural Networks to decide label of input

I am using the keras model with the following layers to predict a label of input (out of 4 labels)
embedding_layer = keras.layers.Embedding(MAX_NB_WORDS,
EMBEDDING_DIM,
weights=[embedding_matrix],
input_length=MAX_SEQUENCE_LENGTH,
trainable=False)
sequence_input = keras.layers.Input(shape = (MAX_SEQUENCE_LENGTH,),
dtype = 'int32')
embedded_sequences = embedding_layer(sequence_input)
hidden_layer = keras.layers.Dense(50, activation='relu')(embedded_sequences)
flat = keras.layers.Flatten()(hidden_layer)
preds = keras.layers.Dense(4, activation='softmax')(flat)
model = keras.models.Model(sequence_input, preds)
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['acc'])
model.fit(X_train, Y_train, batch_size=32, epochs=100)
However, the softmax function returns a number of outputs of 4 (because I have 4 labels)
When I'm using the predict function to get the predicted Y using the same model, I am getting an array of 4 for each X rather than one single label deciding the label for the input.
model.predict(X_test, batch_size = None, verbose = 0, steps = None)
How do I make the output layer of the keras model, or the model.predict function, decide on one single label, rather than output weights for each label?
The following is a common function to sample from a probability vector
def sample(preds, temperature=1.0):
# helper function to sample an index from a probability array
preds = np.asarray(preds).astype('float64')
preds = np.log(preds) / temperature
exp_preds = np.exp(preds)
preds = exp_preds / np.sum(exp_preds)
probas = np.random.multinomial(1, preds, 1)
return np.argmax(probas)
Taken from here.
The temperature parameter decides how much the differences between the probability weights are weightd. A temperature of 1 is considering each weight "as it is", a temperature larger than 1 reduces the differences between the weights, a temperature smaller than 1 increases them.
Here an example using a probability vector on 3 labels:
p = np.array([0.1, 0.7, 0.2]) # The first label has a probability of 10% of being chosen, the second 70%, the third 20%
print(sample(p, 1)) # sample using the input probabilities, unchanged
print(sample(p, 0.1)) # the new vector of probabilities from which to sample is [ 3.54012033e-09, 9.99996371e-01, 3.62508322e-06]
print(sample(p, 10)) # the new vector of probabilities from which to sample is [ 0.30426696, 0.36962778, 0.32610526]
To see the new vector make sample return preds.

Linear Regression in Tensor Flow - Error when modifying getting started code

I am very new to TensorFlow and I am in parallel learning traditional machine learning techniques. Previously, I was able to successfully implement linear regression modelling in matlab and in Python using scikit.
When I tried to reproduce it using Tensorflow with the same dataset, I am getting invalid outputs. Could someone advise me on where I am making the mistake or what I am missing!
Infact, I am using the code from tensor flow introductory tutorial and I just changed the x_train and y_train to a different data set.
# Loading the ML coursera course ex1 (Wk 2) data to try it out
'''
path = r'C:\Users\Prasanth\Dropbox\Python Folder\ML in Python\data\ex1data1.txt'
fh = open(path,'r')
l1 = []
l2 = []
for line in fh:
temp = (line.strip().split(','))
l1.append(float(temp[0]))
l2.append(float(temp[1]))
'''
l1 = [6.1101, 5.5277, 8.5186, 7.0032, 5.8598, 8.3829, 7.4764, 8.5781, 6.4862, 5.0546, 5.7107, 14.164, 5.734, 8.4084, 5.6407, 5.3794, 6.3654, 5.1301, 6.4296, 7.0708, 6.1891, 20.27, 5.4901, 6.3261, 5.5649, 18.945, 12.828, 10.957, 13.176, 22.203, 5.2524, 6.5894, 9.2482, 5.8918, 8.2111, 7.9334, 8.0959, 5.6063, 12.836, 6.3534, 5.4069, 6.8825, 11.708, 5.7737, 7.8247, 7.0931, 5.0702, 5.8014, 11.7, 5.5416, 7.5402, 5.3077, 7.4239, 7.6031, 6.3328, 6.3589, 6.2742, 5.6397, 9.3102, 9.4536, 8.8254, 5.1793, 21.279, 14.908, 18.959, 7.2182, 8.2951, 10.236, 5.4994, 20.341, 10.136, 7.3345, 6.0062, 7.2259, 5.0269, 6.5479, 7.5386, 5.0365, 10.274, 5.1077, 5.7292, 5.1884, 6.3557, 9.7687, 6.5159, 8.5172, 9.1802, 6.002, 5.5204, 5.0594, 5.7077, 7.6366, 5.8707, 5.3054, 8.2934, 13.394, 5.4369]
l2 = [17.592, 9.1302, 13.662, 11.854, 6.8233, 11.886, 4.3483, 12.0, 6.5987, 3.8166, 3.2522, 15.505, 3.1551, 7.2258, 0.71618, 3.5129, 5.3048, 0.56077, 3.6518, 5.3893, 3.1386, 21.767, 4.263, 5.1875, 3.0825, 22.638, 13.501, 7.0467, 14.692, 24.147, -1.22, 5.9966, 12.134, 1.8495, 6.5426, 4.5623, 4.1164, 3.3928, 10.117, 5.4974, 0.55657, 3.9115, 5.3854, 2.4406, 6.7318, 1.0463, 5.1337, 1.844, 8.0043, 1.0179, 6.7504, 1.8396, 4.2885, 4.9981, 1.4233, -1.4211, 2.4756, 4.6042, 3.9624, 5.4141, 5.1694, -0.74279, 17.929, 12.054, 17.054, 4.8852, 5.7442, 7.7754, 1.0173, 20.992, 6.6799, 4.0259, 1.2784, 3.3411, -2.6807, 0.29678, 3.8845, 5.7014, 6.7526, 2.0576, 0.47953, 0.20421, 0.67861, 7.5435, 5.3436, 4.2415, 6.7981, 0.92695, 0.152, 2.8214, 1.8451, 4.2959, 7.2029, 1.9869, 0.14454, 9.0551, 0.61705]
print ('List length and data type', len(l1), type(l1))
#------------------#
import tensorflow as tf
# Model parameters
W = tf.Variable([0], dtype=tf.float64)
b = tf.Variable([0], dtype=tf.float64)
# Model input and output
x = tf.placeholder(tf.float64)
linear_model = W * x + b
y = tf.placeholder(tf.float64)
# loss or cost function
loss = tf.reduce_sum(tf.square(linear_model - y)) # sum of the squares
# optimizer (gradient descent) with learning rate = 0.01
optimizer = tf.train.GradientDescentOptimizer(0.01)
train = optimizer.minimize(loss)
# training data (labelled input & output swt)
# Using coursera data instead of sample data
#x_train = [1.0, 2, 3, 4]
#y_train = [0, -1, -2, -3]
x_train = l1
y_train = l2
# training loop (1000 iterations)
init = tf.global_variables_initializer()
sess = tf.Session()
sess.run(init) # reset values to wrong
for i in range(1000):
sess.run(train, {x: x_train, y: y_train})
# evaluate training accuracy
curr_W, curr_b, curr_loss = sess.run([W, b, loss], {x: x_train, y: y_train})
print("W: %s b: %s loss: %s"%(curr_W, curr_b, curr_loss))
Output
List length and data type: 97 <class 'list'>
W: [ nan] b: [ nan] loss: nan
One major problem with your estimator is the loss function. Since you use tf.reduce_sum, the loss grows with the number of samples, which you have to compensate by using a smaller learning rate. A better solution would be to use mean square error loss
loss = tf.reduce_mean(tf.square(linear_model - y))

Resources