I'm trying to run an example from torch7 only to come across this error.
sandesh#sandesh-H87M-D3H:~/Downloads/tutorials-master/2_supervised$ luajit doall.lua
==> processing options
==> executing all
==> downloading dataset
==> using regular, full training data
==> loading dataset
==> preprocessing data
==> preprocessing data: colorspace RGB -> YUV
==> preprocessing data: normalize each feature (channel) globally
==> preprocessing data: normalize all three channels locally
==> verify statistics
training data, y-channel, mean: 0.00067706172257129
training data, y-channel, standard deviation: 0.39473240322794
test data, y-channel, mean: -0.0010822884348063
test data, y-channel, standard deviation: 0.38091408093043
training data, u-channel, mean: -0.0048219975630079
training data, u-channel, standard deviation: 0.29768662619471
test data, u-channel, mean: -0.0030795217110624
test data, u-channel, standard deviation: 0.22289780235542
training data, v-channel, mean: 0.0036312269637064
training data, v-channel, standard deviation: 0.25405592463897
test data, v-channel, mean: 0.0033847450016769
test data, v-channel, standard deviation: 0.20362829592977
==> visualizing data
==> define parameters
==> construct model
==> here is the model:
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> output]
(1): nn.SpatialConvolutionMM(3 -> 64, 5x5)
(2): nn.Tanh
(3): nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> output]
(1): nn.Square
(2): nn.SpatialAveragePooling(2,2,2,2)
(3): nn.MulConstant
(4): nn.Sqrt
}
(4): nn.SpatialSubtractiveNormalization
(5): nn.SpatialConvolutionMM(64 -> 64, 5x5)
(6): nn.Tanh
(7): nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> output]
(1): nn.Square
(2): nn.SpatialAveragePooling(2,2,2,2)
(3): nn.MulConstant
(4): nn.Sqrt
}
(8): nn.SpatialSubtractiveNormalization
(9): nn.Reshape(1600)
(10): nn.Linear(1600 -> 128)
(11): nn.Tanh
(12): nn.Linear(128 -> 10)
}
==> define loss
==> here is the loss function:
nn.ClassNLLCriterion
==> defining some tools
luajit: /home/sandesh/torch/install/share/lua/5.1/sys/init.lua:38: attempt to index local 'f' (a nil value)
stack traceback:
/home/sandesh/torch/install/share/lua/5.1/sys/init.lua:38: in function 'execute'
/home/sandesh/torch/install/share/lua/5.1/sys/init.lua:71: in function 'uname'
/home/sandesh/torch/install/share/lua/5.1/optim/Logger.lua:38: in function '__init'
/home/sandesh/torch/install/share/lua/5.1/torch/init.lua:91: in function
[C]: in function 'Logger'
4_train.lua:60: in main chunk
[C]: in function 'dofile'
doall.lua:70: in main chunk
[C]: at 0x00406670
I did not change any code in any of the lua files ...
This is the 4_train.lua file
----------------------------------------------------------------------
-- This script demonstrates how to define a training procedure,
-- irrespective of the model/loss functions chosen.
--
-- It shows how to:
-- + construct mini-batches on the fly
-- + define a closure to estimate (a noisy) loss
-- function, as well as its derivatives wrt the parameters of the
-- model to be trained
-- + optimize the function, according to several optmization
-- methods: SGD, L-BFGS.
--
-- Clement Farabet
----------------------------------------------------------------------
require 'torch' -- torch
require 'xlua' -- xlua provides useful tools, like progress bars
require 'optim' -- an optimization package, for online and batch methods
----------------------------------------------------------------------
-- parse command line arguments
if not opt then
print '==> processing options'
cmd = torch.CmdLine()
cmd:text()
cmd:text('SVHN Training/Optimization')
cmd:text()
cmd:text('Options:')
cmd:option('-save', 'results', 'subdirectory to save/log experiments in')
cmd:option('-visualize', false, 'visualize input data and weights during training')
cmd:option('-plot', false, 'live plot')
cmd:option('-optimization', 'SGD', 'optimization method: SGD | ASGD | CG | LBFGS')
cmd:option('-learningRate', 1e-3, 'learning rate at t=0')
cmd:option('-batchSize', 1, 'mini-batch size (1 = pure stochastic)')
cmd:option('-weightDecay', 0, 'weight decay (SGD only)')
cmd:option('-momentum', 0, 'momentum (SGD only)')
cmd:option('-t0', 1, 'start averaging at t0 (ASGD only), in nb of epochs')
cmd:option('-maxIter', 2, 'maximum nb of iterations for CG and LBFGS')
cmd:text()
opt = cmd:parse(arg or {})
end
----------------------------------------------------------------------
-- CUDA?
if opt.type == 'cuda' then
model:cuda()
criterion:cuda()
end
----------------------------------------------------------------------
print '==> defining some tools'
-- classes
classes = {'1','2','3','4','5','6','7','8','9','0'}
-- This matrix records the current confusion across classes
confusion = optim.ConfusionMatrix(classes)
-- Log results to files
trainLogger = optim.Logger(paths.concat(opt.save, 'train.log'))
testLogger = optim.Logger(paths.concat(opt.save, 'test.log'))
-- Retrieve parameters and gradients:
-- this extracts and flattens all the trainable parameters of the mode
-- into a 1-dim vector
if model then
parameters,gradParameters = model:getParameters()
end
----------------------------------------------------------------------
print '==> configuring optimizer'
if opt.optimization == 'CG' then
optimState = {
maxIter = opt.maxIter
}
optimMethod = optim.cg
elseif opt.optimization == 'LBFGS' then
optimState = {
learningRate = opt.learningRate,
maxIter = opt.maxIter,
nCorrection = 10
}
optimMethod = optim.lbfgs
elseif opt.optimization == 'SGD' then
optimState = {
learningRate = opt.learningRate,
weightDecay = opt.weightDecay,
momentum = opt.momentum,
learningRateDecay = 1e-7
}
optimMethod = optim.sgd
elseif opt.optimization == 'ASGD' then
optimState = {
eta0 = opt.learningRate,
t0 = trsize * opt.t0
}
optimMethod = optim.asgd
else
error('unknown optimization method')
end
----------------------------------------------------------------------
print '==> defining training procedure'
function train()
-- epoch tracker
epoch = epoch or 1
-- local vars
local time = sys.clock()
-- set model to training mode (for modules that differ in training and testing, like Dropout)
model:training()
-- shuffle at each epoch
shuffle = torch.randperm(trsize)
-- do one epoch
print('==> doing epoch on training data:')
print("==> online epoch # " .. epoch .. ' [batchSize = ' .. opt.batchSize .. ']')
for t = 1,trainData:size(),opt.batchSize do
-- disp progress
xlua.progress(t, trainData:size())
-- create mini batch
local inputs = {}
local targets = {}
for i = t,math.min(t+opt.batchSize-1,trainData:size()) do
-- load new sample
local input = trainData.data[shuffle[i]]
local target = trainData.labels[shuffle[i]]
if opt.type == 'double' then input = input:double()
elseif opt.type == 'cuda' then input = input:cuda() end
table.insert(inputs, input)
table.insert(targets, target)
end
-- create closure to evaluate f(X) and df/dX
local feval = function(x)
-- get new parameters
if x ~= parameters then
parameters:copy(x)
end
-- reset gradients
gradParameters:zero()
-- f is the average of all criterions
local f = 0
-- evaluate function for complete mini batch
for i = 1,#inputs do
-- estimate f
local output = model:forward(inputs[i])
local err = criterion:forward(output, targets[i])
f = f + err
-- estimate df/dW
local df_do = criterion:backward(output, targets[i])
model:backward(inputs[i], df_do)
-- update confusion
confusion:add(output, targets[i])
end
-- normalize gradients and f(X)
gradParameters:div(#inputs)
f = f/#inputs
-- return f and df/dX
return f,gradParameters
end
-- optimize on current mini-batch
if optimMethod == optim.asgd then
_,_,average = optimMethod(feval, parameters, optimState)
else
optimMethod(feval, parameters, optimState)
end
end
-- time taken
time = sys.clock() - time
time = time / trainData:size()
print("\n==> time to learn 1 sample = " .. (time*1000) .. 'ms')
-- print confusion matrix
print(confusion)
-- update logger/plot
trainLogger:add{['% mean class accuracy (train set)'] = confusion.totalValid * 100}
if opt.plot then
trainLogger:style{['% mean class accuracy (train set)'] = '-'}
trainLogger:plot()
end
-- save/log current net
local filename = paths.concat(opt.save, 'model.net')
os.execute('mkdir -p ' .. sys.dirname(filename))
print('==> saving model to '..filename)
torch.save(filename, model)
-- next epoch
confusion:zero()
epoch = epoch + 1
end
This is doall.lua
----------------------------------------------------------------------
-- This tutorial shows how to train different models on the street
-- view house number dataset (SVHN),
-- using multiple optimization techniques (SGD, ASGD, CG), and
-- multiple types of models.
--
-- This script demonstrates a classical example of training
-- well-known models (convnet, MLP, logistic regression)
-- on a 10-class classification problem.
--
-- It illustrates several points:
-- 1/ description of the model
-- 2/ choice of a loss function (criterion) to minimize
-- 3/ creation of a dataset as a simple Lua table
-- 4/ description of training and test procedures
--
-- Clement Farabet
----------------------------------------------------------------------
require 'torch'
----------------------------------------------------------------------
print '==> processing options'
cmd = torch.CmdLine()
cmd:text()
cmd:text('SVHN Loss Function')
cmd:text()
cmd:text('Options:')
-- global:
cmd:option('-seed', 1, 'fixed input seed for repeatable experiments')
cmd:option('-threads', 2, 'number of threads')
-- data:
cmd:option('-size', 'full', 'how many samples do we load: small | full | extra')
-- model:
cmd:option('-model', 'convnet', 'type of model to construct: linear | mlp | convnet')
-- loss:
cmd:option('-loss', 'nll', 'type of loss function to minimize: nll | mse | margin')
-- training:
cmd:option('-save', 'results', 'subdirectory to save/log experiments in')
cmd:option('-plot', false, 'live plot')
cmd:option('-optimization', 'SGD', 'optimization method: SGD | ASGD | CG | LBFGS')
cmd:option('-learningRate', 1e-3, 'learning rate at t=0')
cmd:option('-batchSize', 1, 'mini-batch size (1 = pure stochastic)')
cmd:option('-weightDecay', 0, 'weight decay (SGD only)')
cmd:option('-momentum', 0, 'momentum (SGD only)')
cmd:option('-t0', 1, 'start averaging at t0 (ASGD only), in nb of epochs')
cmd:option('-maxIter', 2, 'maximum nb of iterations for CG and LBFGS')
cmd:option('-type', 'double', 'type: double | float | cuda')
cmd:text()
opt = cmd:parse(arg or {})
-- nb of threads and fixed seed (for repeatable experiments)
if opt.type == 'float' then
print('==> switching to floats')
torch.setdefaulttensortype('torch.FloatTensor')
elseif opt.type == 'cuda' then
print('==> switching to CUDA')
require 'cunn'
torch.setdefaulttensortype('torch.FloatTensor')
end
torch.setnumthreads(opt.threads)
torch.manualSeed(opt.seed)
----------------------------------------------------------------------
print '==> executing all'
dofile '1_data.lua'
dofile '2_model.lua'
dofile '3_loss.lua'
dofile '4_train.lua'
dofile '5_test.lua'
----------------------------------------------------------------------
print '==> training!'
while true do
train()
test()
end
The git link is https://github.com/torch/tutorials/blob/master/2_supervised/4_train.lua
Also I'm not using cuda as I dont have a GPU
I am currently playing with torch-rnn and stumbled accross the issue repeatedly. Check your input and see if the incorrect / no file was inputted. If the code worked before and no attempts were made to change it, it should be intact and it is your input that is problematic.
I won't tell you what is wrong as you did not show any efforts to solve the problem yourself. But I will tell you how to proceed.
luajit: /home/sandesh/torch/install/share/lua/5.1/sys/init.lua:38:
attempt to index local 'f' (a nil value)
stack traceback:
/home/sandesh/torch/install/share/lua/5.1/sys/init.lua:38: in function 'execute'
/home/sandesh/torch/install/share/lua/5.1/sys/init.lua:71: in function 'uname'
/home/sandesh/torch/install/share/lua/5.1/optim/Logger.lua:38: in function '__init'
/home/sandesh/torch/install/share/lua/5.1/torch/init.lua:91: in function
[C]: in function 'Logger'
This tells you that some local f in init.lua line 38 is nil, which causes a problem. So open that file and find out where that f's value should come from and why it is nil. Then fix that. Also see if there is a more recent version of Torch that handles f being nil correctly. If not, change the code yourself, if possible. Otherwise try to stop this from happening by validating your inputs to torch.
Related
can someone tell me why my cross entropy Loss function giving this error:
My accuracy method:
def accuracy(outputs, labels):
_, preds = torch.max(outputs, dim=1)
return torch.tensor(torch.sum(preds == labels).item() / len(preds))
My class defining the model:
class PulsarLogisticRegression(nn.Module):
def __init__(self):
super().__init__()
self.linear= nn.Linear(input_size,output_size)
def forward(self,xb):
xb = xb.view(xb.size(0), -1)
out= self.linear(xb)
return out
def training_step(self, batch):
inputs, targets = batch
# Generate predictions
out = self(inputs)
# Calcuate loss
loss = F.cross_entropy(out,targets)
return loss
def validation_step(self, batch):
inputs, targets = batch
# Generate predictions
out = self(inputs)
# Calculate loss
loss = F.cross_entropy(out,targets)
acc = accuracy(out, targets) # Calculate accuracy
return {'val_loss': loss, 'val_acc': acc} # fill this
def validation_epoch_end(self, outputs):
batch_losses = [x['val_loss'] for x in outputs]
epoch_loss = torch.stack(batch_losses).mean() # Combine losses
batch_accs = [x['val_acc'] for x in outputs]
epoch_acc = torch.stack(batch_accs).mean() # Combine accuracies
return {'val_loss': epoch_loss.item(), 'val_acc': epoch_acc.item()}
def epoch_end(self, epoch, result, num_epochs):
# Print result every 20th epoch
print("Epoch [{}], val_loss: {:.4f}, val_acc: {:.4f}".format(epoch, result['val_loss'], result['val_acc']))
error: Cross Entropy Error is expecting 1D target tensor but multi-target is bieng assigned to it.
RuntimeError Traceback (most recent call last)
<ipython-input-88-cd9b8a9a3b02> in <module>()
----> 1 result = evaluate(model, val_loader) # Use the the evaluate function
2 print(result) 4 frames
/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py in nll_loss(input, target, weight, size_average, ignore_index, reduce, reduction)
2262 .format(input.size(0), target.size(0)))
2263 if dim == 2:
-> 2264 ret = torch._C._nn.nll_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
2265 elif dim == 4:
2266 ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index)
RuntimeError: 1D target tensor expected, multi-target not supported
I am very new to Machine Learning, I am trying to make a model which predicts one column of data based on 5 columns. The values in the column are 0 and 1. So it is basically a binary classification model.
What I tried:
As I said I am fairly new to this field, some explanations suggest to use squeeze function to somehow reduce the shape of target tensor to 1D but that seems to throw some other errors in other methods of the class.
I am looking for an error function which would help me get correct accuracy.
I'm trying to train a feed forward neural network for the first time in torch. Here's my dataset: http://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/datasets/transfusion.csv
Here's the code (based, http://mdtux89.github.io/2015/12/11/torch-tutorial.html):
require 'nn'
mlp = nn.Sequential()
inputSize = 4
hiddenLayer1Size = 4
hiddenLayer2Size = 4
mlp:add(nn.Linear(inputSize,hiddenLayer1Size)) -- row, coulm
mlp:add(nn.Tanh())
mlp:add(nn.Linear(hiddenLayer1Size,hiddenLayer2Size))
mlp:add(nn.Tanh())
nclasses = 1
mlp:add(nn.Linear(hiddenLayer2Size,nclasses))
mlp:add(nn.LogSoftMax())
output = mlp:forward(torch.rand(1,4))
print(output)
-- TRAINING using inbuilt stochastic gradient descent, 2 params: network, criterian fun. --
LRate = 0.1
criterion = nn.ClassNLLCriterion()
trainer = nn.StochasticGradient(mlp, criterion)
trainer.learningRate = LRate
function string:splitAtCommas()
local sep, values = ",", {}
local pattern = string.format("([^%s]+)", sep)
self:gsub(pattern, function(c) values[#values+1] = c end)
return values
end
function loadData(dataFile)
local dataset,i = {},0
for line in io.lines(dataFile) do
local values = line:splitAtCommas()
local y = torch.Tensor(1)
y[1] = values[#values] -- the target class is the last number in the line
values[#values] = nil
local x = torch.Tensor(values) -- the input data is all the other numbers
dataset[i] = {x, y}
i = i + 1
end
function dataset:size() return (i - 1) end -- the requirement mentioned
return dataset
end
dataset = loadData("transfusion.csv")
trainer:train(dataset)
Here's the error report:
# StochasticGradient: training
/Users/drdre/torch/install/share/lua/5.1/nn/THNN.lua:109: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /Users/drdre/torch/extra/nn/lib/THNN/generic/ClassNLLCriterion.c:38
stack traceback:
[C]: in function 'v'
/Users/drdre/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'ClassNLLCriterion_updateOutput'
...dre/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:41: in function 'forward'
...re/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'f'
[string "local f = function() return trainer:train(dat..."]:1: in main chunk
[C]: in function 'xpcall'
/Users/drdre/torch/install/share/lua/5.1/itorch/main.lua:209: in function </Users/drdre/torch/install/share/lua/5.1/itorch/main.lua:173>
/Users/drdre/torch/install/share/lua/5.1/lzmq/poller.lua:75: in function 'poll'
/Users/drdre/torch/install/share/lua/5.1/lzmq/impl/loop.lua:307: in function 'poll'
/Users/drdre/torch/install/share/lua/5.1/lzmq/impl/loop.lua:325: in function 'sleep_ex'
/Users/drdre/torch/install/share/lua/5.1/lzmq/impl/loop.lua:370: in function 'start'
/Users/drdre/torch/install/share/lua/5.1/itorch/main.lua:381: in main chunk
[C]: in function 'require'
(command line):1: in main chunk
[C]: at 0x0105e4cd10
Use nclasses = 2 and y[1] = values[#values] + 1. See the doc:
a desired output y (an integer 1 to n, in this case n = 2 classes)
require 'torch';
require 'nn';
require 'nnx';
mnist = require 'mnist';
fullset = mnist.traindataset()
testset = mnist.testdataset()
trainset = {
size = 50000,
data = fullset.data[{{1,50000}}]:double(),
label = fullset.label[{{1,50000}}]
}
validationset = {
size = 10000,
data = fullset.data[{{50001, 60000}}]:double(),
label = fullset.label[{{50001,60000}}]
}
-- MNIST Dataset has 28x28 images
model = nn.Sequential()
model:add(nn.SpatialConvolutionMM(1, 32, 5, 5)) -- 32x24x24
model:add(nn.ReLU())
model:add(nn.SpatialMaxPooling(3, 3, 3, 3)) -- 32x8x8
model:add(nn.SpatialConvolutionMM(32, 64, 5, 5)) -- 64x4x4
model:add(nn.Tanh())
model:add(nn.SpatialMaxPooling(2, 2, 2, 2)) -- 64x2x2
model:add(nn.Reshape(64*2*2))
model:add(nn.Linear(64*2*2, 200))
model:add(nn.Tanh())
model:add(nn.Linear(200, 10))
model:add(nn.LogSoftMax())
criterion = nn.ClassNLLCriterion()
x, dldx = model:getParameters() -- now x stores the trainable parameters and dldx stores the gradient wrt these params in the model above
sgd_params = {
learningRate = 1e-2,
learningRateDecay = 1e-4,
weightDecay = 1e-3,
momentum = 1e-4
}
step = function ( batchsize )
-- setting up variables
local count = 0
local current_loss = 0
local shuffle = torch.randperm(trainset.size)
-- setting default batchsize as 200
batchsize = batchsize or 200
-- setting inputs and targets for minibatches
for minibatch_number = 1, trainset.size, batchsize do
local size = math.min( trainset.size - minibatch_number + 1, batchsize )
local inputs = torch.Tensor(size, 28, 28)
local targets = torch.Tensor(size)
for index = 1, size do
inputs[index] = trainset.data[ shuffle[ index + minibatch_number ]]
targets[index] = trainset.label[ shuffle[ index + minibatch_number ] ]
end
-- defining feval function to return loss and gradients of loss w.r.t. params
feval = function( x_new )
--print ( "---------------------------------safe--------------------")
if x ~= x_new then x:copy(x_new) end
-- initializing gradParsams to zero
dldx:zero()
-- calculating loss and param gradients
local loss = criterion:forward( model.forward( inputs ), targets )
model:backward( inputs, criterion:backward( model.output, targets ) )
return loss, dldx
end
-- getting loss
-- optim returns x*, {fx} where x* is new set of params and {fx} is { loss } => fs[ 1 ] carries loss from feval
print(feval ~= nil and x ~= nil and sgd_params ~= nil)
_,fs = optim.sgd(feval, x, sgd_params)
count = count + 1
current_loss = current_loss + fs[ 1 ]
end
--returning avg loss over the minibatch
return current_loss / count
end
max_iters = 30
for i = 1 ,max_iters do
local loss = step()
print(string.format('Epoch: %d Current loss: %4f', i, loss))
end
I am new to torch and lua and I'm not able to find an error in the above code. Can anyone suggest a way to debug it?
The error:
/home/afroz/torch/install/bin/luajit: /home/afroz/test.lua:88: attempt to index global 'optim' (a nil value)
stack traceback:
/home/afroz/test.lua:88: in function 'step'
/home/afroz/test.lua:102: in main chunk
[C]: in function 'dofile'
...froz/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
optim is not defined in the scope of your script. You try to call optim.sgd which of course results in the error you see.
Like nn, optim is a extension package to torch.
require 'torch';
require 'nn';
require 'nnx';
Remember those lines in the beginning of your script? They basically execute the definition of those packages.
Make sure optim is installed, then try to require it.
https://github.com/torch/optim
optim is not assigned anywhere in the script, so when the script references optim.sgd, its value is nil and you get the error you shown. You need to doublecheck the script to make sure the optim is assigned the correct value.
I'm currently trying to build a simple model for predicting time series. The goal would be to train the model with a sequence so that the model is able to predict future values.
I'm using tensorflow and lstm cells to do so. The model is trained with truncated backpropagation through time. My question is how to structure the data for training.
For example let's assume we want to learn the given sequence:
[1,2,3,4,5,6,7,8,9,10,11,...]
And we unroll the network for num_steps=4.
Option 1
input data label
1,2,3,4 2,3,4,5
5,6,7,8 6,7,8,9
9,10,11,12 10,11,12,13
...
Option 2
input data label
1,2,3,4 2,3,4,5
2,3,4,5 3,4,5,6
3,4,5,6 4,5,6,7
...
Option 3
input data label
1,2,3,4 5
2,3,4,5 6
3,4,5,6 7
...
Option 4
input data label
1,2,3,4 5
5,6,7,8 9
9,10,11,12 13
...
Any help would be appreciated.
I'm just about to learn LSTMs in TensorFlow and try to implement an example which (luckily) tries to predict some time-series / number-series genereated by a simple math-fuction.
But I'm using a different way to structure the data for training, motivated by Unsupervised Learning of Video Representations using LSTMs:
LSTM Future Predictor Model
Option 5:
input data label
1,2,3,4 5,6,7,8
2,3,4,5 6,7,8,9
3,4,5,6 7,8,9,10
...
Beside this paper, I (tried) to take inspiration by the given TensorFlow RNN examples. My current complete solution looks like this:
import math
import random
import numpy as np
import tensorflow as tf
LSTM_SIZE = 64
LSTM_LAYERS = 2
BATCH_SIZE = 16
NUM_T_STEPS = 4
MAX_STEPS = 1000
LAMBDA_REG = 5e-4
def ground_truth_func(i, j, t):
return i * math.pow(t, 2) + j
def get_batch(batch_size):
seq = np.zeros([batch_size, NUM_T_STEPS, 1], dtype=np.float32)
tgt = np.zeros([batch_size, NUM_T_STEPS], dtype=np.float32)
for b in xrange(batch_size):
i = float(random.randint(-25, 25))
j = float(random.randint(-100, 100))
for t in xrange(NUM_T_STEPS):
value = ground_truth_func(i, j, t)
seq[b, t, 0] = value
for t in xrange(NUM_T_STEPS):
tgt[b, t] = ground_truth_func(i, j, t + NUM_T_STEPS)
return seq, tgt
# Placeholder for the inputs in a given iteration
sequence = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS, 1])
target = tf.placeholder(tf.float32, [BATCH_SIZE, NUM_T_STEPS])
fc1_weight = tf.get_variable('w1', [LSTM_SIZE, 1], initializer=tf.random_normal_initializer(mean=0.0, stddev=1.0))
fc1_bias = tf.get_variable('b1', [1], initializer=tf.constant_initializer(0.1))
# ENCODER
with tf.variable_scope('ENC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
initial_state = multi_lstm.zero_state(BATCH_SIZE, tf.float32)
state = initial_state
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
learned_representation = state
# DECODER
with tf.variable_scope('DEC_LSTM'):
lstm = tf.nn.rnn_cell.LSTMCell(LSTM_SIZE)
multi_lstm = tf.nn.rnn_cell.MultiRNNCell([lstm] * LSTM_LAYERS)
state = learned_representation
logits_stacked = None
loss = 0.0
for t_step in xrange(NUM_T_STEPS):
if t_step > 0:
tf.get_variable_scope().reuse_variables()
# state value is updated after processing each batch of sequences
output, state = multi_lstm(sequence[:, t_step, :], state)
# output can be used to make next number prediction
logits = tf.matmul(output, fc1_weight) + fc1_bias
if logits_stacked is None:
logits_stacked = logits
else:
logits_stacked = tf.concat(1, [logits_stacked, logits])
loss += tf.reduce_sum(tf.square(logits - target[:, t_step])) / BATCH_SIZE
reg_loss = loss + LAMBDA_REG * (tf.nn.l2_loss(fc1_weight) + tf.nn.l2_loss(fc1_bias))
train = tf.train.AdamOptimizer().minimize(reg_loss)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
total_loss = 0.0
for step in xrange(MAX_STEPS):
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
_, current_loss = sess.run([train, reg_loss], feed)
if step % 10 == 0:
print("#{}: {}".format(step, current_loss))
total_loss += current_loss
print('Total loss:', total_loss)
print('### SIMPLE EVAL: ###')
seq_batch, target_batch = get_batch(BATCH_SIZE)
feed = {sequence: seq_batch, target: target_batch}
prediction = sess.run([logits_stacked], feed)
for b in xrange(BATCH_SIZE):
print("{} -> {})".format(str(seq_batch[b, :, 0]), target_batch[b, :]))
print(" `-> Prediction: {}".format(prediction[0][b]))
Sample output of this looks like this:
### SIMPLE EVAL: ###
# [input seq] -> [target prediction]
# `-> Prediction: [model prediction]
[ 33. 53. 113. 213.] -> [ 353. 533. 753. 1013.])
`-> Prediction: [ 19.74548721 28.3149128 33.11489105 35.06603241]
[ -17. -32. -77. -152.] -> [-257. -392. -557. -752.])
`-> Prediction: [-16.38951683 -24.3657589 -29.49801064 -31.58583832]
[ -7. -4. 5. 20.] -> [ 41. 68. 101. 140.])
`-> Prediction: [ 14.14126873 22.74848557 31.29668617 36.73633194]
...
The model is a LSTM-autoencoder having 2 layers each.
Unfortunately, as you can see in the results, this model does not learn the sequence properly. I might be the case that I'm just doing a bad mistake somewhere, or that 1000-10000 training steps is just way to few for a LSTM. As I said, I'm also just starting to understand/use LSTMs properly.
But hopefully this can give you some inspiration regarding the implementation.
After reading several LSTM introduction blogs e.g. Jakob Aungiers', option 3 seems to be the right one for stateless LSTM.
If your LSTMs need to remember data longer ago than your num_steps, your can train in a stateful way - for a Keras example see Philippe Remy's blog post "Stateful LSTM in Keras". Philippe does not show an example for batch size greater than one, however. I guess that in your case a batch size of four with stateful LSTM could be used with the following data (written as input -> label):
batch #0:
1,2,3,4 -> 5
2,3,4,5 -> 6
3,4,5,6 -> 7
4,5,6,7 -> 8
batch #1:
5,6,7,8 -> 9
6,7,8,9 -> 10
7,8,9,10 -> 11
8,9,10,11 -> 12
batch #2:
9,10,11,12 -> 13
...
By this, the state of e.g. the 2nd sample in batch #0 is correctly reused to continue training with the 2nd sample of batch #1.
This is somehow similar to your option 4, however you are not using all available labels there.
Update:
In extension to my suggestion where batch_size equals the num_steps, Alexis Huet gives an answer for the case of batch_size being a divisor of num_steps, which can be used for larger num_steps. He describes it nicely on his blog.
I believe Option 1 is closest to the reference implementation in /tensorflow/models/rnn/ptb/reader.py
def ptb_iterator(raw_data, batch_size, num_steps):
"""Iterate on the raw PTB data.
This generates batch_size pointers into the raw PTB data, and allows
minibatch iteration along these pointers.
Args:
raw_data: one of the raw data outputs from ptb_raw_data.
batch_size: int, the batch size.
num_steps: int, the number of unrolls.
Yields:
Pairs of the batched data, each a matrix of shape [batch_size, num_steps].
The second element of the tuple is the same data time-shifted to the
right by one.
Raises:
ValueError: if batch_size or num_steps are too high.
"""
raw_data = np.array(raw_data, dtype=np.int32)
data_len = len(raw_data)
batch_len = data_len // batch_size
data = np.zeros([batch_size, batch_len], dtype=np.int32)
for i in range(batch_size):
data[i] = raw_data[batch_len * i:batch_len * (i + 1)]
epoch_size = (batch_len - 1) // num_steps
if epoch_size == 0:
raise ValueError("epoch_size == 0, decrease batch_size or num_steps")
for i in range(epoch_size):
x = data[:, i*num_steps:(i+1)*num_steps]
y = data[:, i*num_steps+1:(i+1)*num_steps+1]
yield (x, y)
However, another Option is to select a pointer into your data array randomly for each training sequence.
I'm working on a small Torch7/ Lua script to create and train a neural network, but I'm running into errors. Any ideas?
Here's my code:
require 'dp'
require 'csvigo'
require 'nn'
--[[hyperparameters]]--
opt = {
nHidden = 100, --number of hidden units
learningRate = 0.1, --training learning rate
momentum = 0.9, --momentum factor to use for training
maxOutNorm = 1, --maximum norm allowed for output neuron weights
batchSize = 128, --number of examples per mini-batch
maxTries = 100, --maximum number of epochs without reduction in validation error.
maxEpoch = 1 --maximum number of epochs of training
}
csv2tensor = require 'csv2tensor'
-- inputs, outputs = csv2tensor.load("/Users/robertgrzesik/NodeJS/csv_export.csv")
inputs = csv2tensor.load("/Users/robertgrzesik/NodeJS/csv_export.csv", {exclude={"positive", "negative", "neutral"}})
outputs = csv2tensor.load("/Users/robertgrzesik/NodeJS/csv_export.csv", {include={"positive", "negative", "neutral"}}) -- "positive", "negative", "neutral"
print("outputs: ", outputs)
print("inputs: ", inputs)
local dataset = {}
print("inputs:size(1)", inputs:size(1))
inputSize = inputs:size(1)
outputSize = outputs:size(1)
for i=1,inputSize do
dataset[i] = {inputs[i], outputs[i]}
end
dataset.size = function(self)
return inputSize
end
-- ======================================= --
-- Create NN
-- ======================================= --
print '[INFO] Creating NN..'
mlp = nn.Sequential(); -- make a multi-layer perceptron
inputs = inputSize; outputs = outputSize; HUs = 300; -- parameters
mlp:add(nn.Linear(inputs, HUs))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(HUs, outputs))
-- ======================================= --
-- MSE and Training
-- ======================================= --
print '[INFO] MSE and train NN..'
criterion = nn.MSECriterion()
trainer = nn.StochasticGradient(mlp, criterion)
trainer.learningRate = 0.01
trainer:train(dataset)
Here's the error:
# StochasticGradient: training
/Users/robertgrzesik/torch/install/bin/luajit: .../robertgrzesik/torch/install/share/lua/5.1/nn/Linear.lua:37: size mismatch
stack traceback:
[C]: in function 'addmv'
.../robertgrzesik/torch/install/share/lua/5.1/nn/Linear.lua:37: in function 'updateOutput'
...ertgrzesik/torch/install/share/lua/5.1/nn/Sequential.lua:25: in function 'forward'
...ik/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'train'
/Users/robertgrzesik/Lua/async-master/tests/dp-test.lua:53: in main chunk
[C]: in function 'dofile'
...esik/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x01028bc780
And here's a sample of my data:
positive,negative,basketball,neutral,the,be,and,of,a,in,to,have,it,I,for,that,he,you,with,on,do,this,they,at,who,if,her,people,take,your,like,our,new,because,woman,great,show,million,money,job,little,important,lose,include,rest,fight,perfect
0,0,0,1,3,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,1,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Basically my aim is to create a deep neural network linking the frequency of words used in a sentence and tie it to the user rating it as either "positive", "negative" or "neutral" (my outputs, which are binary). Please also let me know if my thinking is correct on this.
Thank you!
Found the problem!
The issue was that I was giving the wrong sizes when creating the network. I was passing in "inputs:size(1)" instead of "inputs:size(2)". Here's the fix
mlp:add(nn.Linear(inputs:size(2), HUs))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(HUs, outputs:size(2)))
Feel like I'm slowly starting to get the hang of Lua/ Torch! Score