I'm trying to train a feed forward neural network for the first time in torch. Here's my dataset: http://ocw.mit.edu/courses/sloan-school-of-management/15-097-prediction-machine-learning-and-statistics-spring-2012/datasets/transfusion.csv
Here's the code (based, http://mdtux89.github.io/2015/12/11/torch-tutorial.html):
require 'nn'
mlp = nn.Sequential()
inputSize = 4
hiddenLayer1Size = 4
hiddenLayer2Size = 4
mlp:add(nn.Linear(inputSize,hiddenLayer1Size)) -- row, coulm
mlp:add(nn.Tanh())
mlp:add(nn.Linear(hiddenLayer1Size,hiddenLayer2Size))
mlp:add(nn.Tanh())
nclasses = 1
mlp:add(nn.Linear(hiddenLayer2Size,nclasses))
mlp:add(nn.LogSoftMax())
output = mlp:forward(torch.rand(1,4))
print(output)
-- TRAINING using inbuilt stochastic gradient descent, 2 params: network, criterian fun. --
LRate = 0.1
criterion = nn.ClassNLLCriterion()
trainer = nn.StochasticGradient(mlp, criterion)
trainer.learningRate = LRate
function string:splitAtCommas()
local sep, values = ",", {}
local pattern = string.format("([^%s]+)", sep)
self:gsub(pattern, function(c) values[#values+1] = c end)
return values
end
function loadData(dataFile)
local dataset,i = {},0
for line in io.lines(dataFile) do
local values = line:splitAtCommas()
local y = torch.Tensor(1)
y[1] = values[#values] -- the target class is the last number in the line
values[#values] = nil
local x = torch.Tensor(values) -- the input data is all the other numbers
dataset[i] = {x, y}
i = i + 1
end
function dataset:size() return (i - 1) end -- the requirement mentioned
return dataset
end
dataset = loadData("transfusion.csv")
trainer:train(dataset)
Here's the error report:
# StochasticGradient: training
/Users/drdre/torch/install/share/lua/5.1/nn/THNN.lua:109: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /Users/drdre/torch/extra/nn/lib/THNN/generic/ClassNLLCriterion.c:38
stack traceback:
[C]: in function 'v'
/Users/drdre/torch/install/share/lua/5.1/nn/THNN.lua:109: in function 'ClassNLLCriterion_updateOutput'
...dre/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:41: in function 'forward'
...re/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'f'
[string "local f = function() return trainer:train(dat..."]:1: in main chunk
[C]: in function 'xpcall'
/Users/drdre/torch/install/share/lua/5.1/itorch/main.lua:209: in function </Users/drdre/torch/install/share/lua/5.1/itorch/main.lua:173>
/Users/drdre/torch/install/share/lua/5.1/lzmq/poller.lua:75: in function 'poll'
/Users/drdre/torch/install/share/lua/5.1/lzmq/impl/loop.lua:307: in function 'poll'
/Users/drdre/torch/install/share/lua/5.1/lzmq/impl/loop.lua:325: in function 'sleep_ex'
/Users/drdre/torch/install/share/lua/5.1/lzmq/impl/loop.lua:370: in function 'start'
/Users/drdre/torch/install/share/lua/5.1/itorch/main.lua:381: in main chunk
[C]: in function 'require'
(command line):1: in main chunk
[C]: at 0x0105e4cd10
Use nclasses = 2 and y[1] = values[#values] + 1. See the doc:
a desired output y (an integer 1 to n, in this case n = 2 classes)
Related
vals = { i=1, j=2}
setmetatable(vals, {
__add = function (a, b)
return a*b
end,
})
sr = vals.i+vals.j
print(sr)
It prints sr as 3. The expected answer is 2 as 1*2 equals to 2. Why the addition operation (metamethod) is not getting into picture from the metatable of vars?
You misunderstood that a table with a metatable only fires at the table.
...not at a key/value it holds with same or different datatype.
They have not inherited the metatable.
Exception: Own implemention with __newindex
Where you can add/share the parent metatable to a new table (child then )
So look at this code and try it by yourself and understood...
vals = setmetatable({i = 1, j = 2}, {
__add = function (left, right)
return left.i * right -- Little change here
end,
})
vals + 15 -- __add will trigger this and returning: 15
vals + vals.j -- __add will trigger this and returning: 2
-- Lets change vals.i
vals.i = vals + vals.j
vals + 15 -- __add will trigger this and returning: 30
vals + vals.j -- __add will trigger this and returning: 4
"Numbers do not have metatables."
The datatype string has.
for key, value in pairs(getmetatable(_VERSION)) do
print(key, "=", value)
end
__div = function: 0x565e8f80
__pow = function: 0x565e8fa0
__sub = function: 0x565e9000
__mod = function: 0x565e8fc0
__idiv = function: 0x565e8f60
__add = function: 0x565e9020
__mul = function: 0x565e8fe0
__index = table: 0x5660d0b0
__unm = function: 0x565e8f40
And the __add is somewhat broken or not really useable...
_VERSION + 3
stdin:1: attempt to add a 'string' with a 'number'
stack traceback:
[C]: in metamethod 'add'
stdin:1: in main chunk
[C]: in ?
_VERSION + "3"
stdin:1: attempt to add a 'string' with a 'string'
stack traceback:
[C]: in metamethod 'add'
stdin:1: in main chunk
[C]: in ?
Imagine numbers have all math functions as methods...
> math.pi = debug.setmetatable(math.pi, {__index = math})
-- From now every number has math methods
> print(math.maxinteger:atan2() * 2)
3.1415926535898
-- Using method on return of before used method
-- Maybe it should be called "Chaining methods"?
> print(math.pi:deg():tointeger():type())
integer
-- type() returns a string. - Lets use upper() on this...
> print(math.pi:deg():tointeger():type():upper())
INTEGER
-- Typed in a Lua interactive console ( _VERSION = 5.4 )
-- PS: Also random() is not far away and can be
-- used directly with a number for a dice roller with
> print((1):random(6))
1
> print((1):random(6))
5
> print((1):random(6))
5
> print((1):random(6))
4
-- ;-)
Oups, how easy is this?
the first argument of the __add method is table type
local vals = {
i = 1,
j = 2
}
setmetatable(
vals,
{
__add = function(tbl, value)
return tbl.i + tbl.j + value
end
}
)
print(vals + 13)
I'm trying to implement this paper https://arxiv.org/pdf/1804.06962.pdf with lua/torch7
During the forward pass I got no problem but for the backward pass modele.gapbranch:backward(n, loss_grad) I got this error :
/home/narimene/distro/install/bin/luajit:
...e/narimene/distro/install/share/lua/5.1/nn/Container.lua:67: In 2 module of nn.Sequential:
/home/narimene/distro/install/share/lua/5.1/nn/Concat.lua:92: bad argument #1 to 'narrow' (number expected, got nil)
stack traceback:
[C]: in function 'narrow'
/home/narimene/distro/install/share/lua/5.1/nn/Concat.lua:92: in function </home/narimene/distro/install/share/lua/5.1/nn/Concat.lua:47>
[C]: in function 'xpcall'
...e/narimene/distro/install/share/lua/5.1/nn/Container.lua:63: in function 'rethrowErrors'
.../narimene/distro/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
gap2.lua:240: in function 'opfunc'
/home/narimene/distro/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
gap2.lua:247: in main chunk
[C]: in function 'dofile'
...ene/distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x563fabe66570
WARNING: If you see a stack trace below, it doesn't point to the place
where this error occurred. Please use only the one above.
stack traceback:
[C]: in function 'error'
...e/narimene/distro/install/share/lua/5.1/nn/Container.lua:67: in function 'rethrowErrors'
.../narimene/distro/install/share/lua/5.1/nn/Sequential.lua:84: in function 'backward'
gap2.lua:240: in function 'opfunc'
/home/narimene/distro/install/share/lua/5.1/optim/sgd.lua:44: in function 'sgd'
gap2.lua:247: in main chunk
[C]: in function 'dofile'
...ene/distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:150: in main chunk
[C]: at 0x563fabe66570
Here is the code (gap2.lua):
require 'nn'
require 'cunn'
require 'cutorch'
local GapBranch, Parent = torch.class('nn.GapBranch', 'nn.Module')
function GapBranch:__init(label, num_classes, args, threshold)
Parent.__init(self)
self.gt_labels = label
num_classes = num_classes ~= nil and num_classes or 10
self.threshold = threshold or 0.6
self.gapbranch = nn.Sequential()
self.gapbranch:add(nn.SpatialConvolution(3,512, 3, 3, 1, 1, 1, 1)) -- cette ligne est a enlever
self.cls = self:classifier(512, num_classes)
self.cls_erase = self:classifier(512, num_classes)
self.gapbranch:add(nn.Concat():add(self.cls):add(self.cls_erase))
--self.gapbranch:add(self.cls_erase)
--Optimizer
self.loss_cross_entropy = nn.CrossEntropyCriterion():cuda()
end
function GapBranch:classifier(in_planes, out_planes)
gapcnn = nn.Sequential()
gapcnn:add(nn.SpatialConvolution(in_planes, 1024, 3, 3, 1, 1, 1, 1))
gapcnn:add(nn.ReLU())
gapcnn:add(nn.SpatialConvolution(1024, 1024, 3, 3, 1, 1, 1, 1))
gapcnn:add(nn.ReLU())
gapcnn:add(nn.SpatialConvolution(1024,out_planes, 1, 1, 1,1))
return gapcnn
end
function mulTensor(tensor1, tensor2)
newTensor = torch.Tensor(tensor1:size()):cuda()
for i=1, tensor1:size()[1] do
for j=1, tensor1:size()[2] do
newTensor[{i,j}] = torch.cmul(tensor1[{i,j}],tensor2[{i,1}])
end
end
return newTensor
end
function GapBranch:erase_feature_maps(atten_map_normed, feature_maps, threshold)
if #atten_map_normed:size()>3 then
atten_map_normed = torch.squeeze(atten_map_normed)
end
atten_shape = atten_map_normed:size()
pos = torch.ge(atten_map_normed, threshold)
mask = torch.ones(atten_shape):cuda() -- cuda
mask[pos] = 0.0
m = nn.Unsqueeze(2)
m = m:cuda()
mask = m:forward(mask)
erased_feature_maps = mulTensor(feature_maps,mask) -- Variable
return erased_feature_maps
end
function GapBranch:normalize_atten_maps(atten_map)
atten_shape = atten_map:size()
batch_mins, _ = torch.min(atten_map:view(atten_shape[1],-1),2)
batch_maxs, _ = torch.max(atten_map:view(atten_shape[1],-1),2)
atten_normed = torch.cdiv(atten_map:view(atten_shape[1],-1)-batch_mins:expandAs(atten_map:view(atten_shape[1],-1)), (batch_maxs - batch_mins):expandAs(atten_map:view(atten_shape[1],-1)))
atten_normed = atten_normed:view(atten_shape)
return atten_normed
end
function GapBranch:get_atten_map(feature_maps, gt_labels, normalize)
normalize = normalize or true
label = gt_labels:long()
feature_map_size = feature_maps:size()
batch_size = feature_map_size[1]
atten_map = torch.zeros(feature_map_size[1], feature_map_size[3], feature_map_size[4])
atten_map = atten_map:cuda()
for batch_idx = 1, batch_size do
-- label.data[batch_idx]
--label[batch_idx]
print('label ',label:size())
print('feature_maps ', feature_maps:size())
atten_map[{batch_idx}] = torch.squeeze(feature_maps[{batch_idx,label[batch_idx]}])
end
if normalize then
atten_map = self:normalize_atten_maps(atten_map)
end
return atten_map
end
function GapBranch:gaplayer()
gaplayer = nn.Sequential()
gaplayer:add(nn.SpatialZeroPadding(1, 1, 1 ,1))
gaplayer:add(nn.SpatialAveragePooling(3, 3, 1, 1))
return gaplayer
end
function GapBranch:updateOutput(input) -- need label
-- Backbone
feat = self.gapbranch:get(1):forward(input)
self.gap = self:gaplayer()
self.gap:cuda()
feat3 = self.gap:forward(feat)
m = nn.Unsqueeze(2)
m = m:cuda()
-- Branch A
out = self.gapbranch:get(2):get(1):forward(feat3)
self.map1 = out
logits_1 = torch.squeeze(torch.mean(torch.mean(out, 3), 4))
logits_1 = m:forward(logits_1)
print('logits_1 ',logits_1:size())
--feat5 = self.gapbranch:get(2):get(2):forward(feat3)
localization_map_normed = self:get_atten_map(out, self.gt_labels, true)
self.attention = localization_map_normed
feat_erase = self:erase_feature_maps(localization_map_normed, feat3, self.threshold)
-- Branch B
out_erase = self.gapbranch:get(2):get(2):forward(feat_erase)
self.map_erase = out_erase
logits_ers = torch.squeeze(torch.mean(torch.mean(out_erase, 3), 4))
m = nn.Unsqueeze(2)
m = m:cuda()
logits_ers = m:forward(logits_ers)
print('logits_ers ', logits_ers:size())
return {logits_1, logits_ers}
end
function GapBranch:get_loss(resModele, gt_labels)
--[[ if self.onehot == 'True' then
gt = gt_labels:float()
else
gt = gt_labels:long()
end
--]]
print('resModele ', resModele[1])
loss_cls = self.loss_cross_entropy:forward(resModele[1], gt_labels)
loss_cls_ers = self.loss_cross_entropy:forward(resModele[2], gt_labels)
loss_val = loss_cls + loss_cls_ers
return {loss_val, }
end
require 'paths'
if (not paths.filep("cifar10torchsmall.zip")) then
os.execute('wget -c https://s3.amazonaws.com/torch7/data/cifar10torchsmall.zip')
os.execute('unzip cifar10torchsmall.zip')
end
trainset = torch.load('cifar10-train.t7')
testset = torch.load('cifar10-test.t7')
classes = {'airplane', 'automobile', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck'}
-- ignore setmetatable for now, it is a feature beyond the scope of this tutorial. It sets the index operator.
setmetatable(trainset,
{__index = function(t, i)
return {t.data[i], t.label[i]}
end}
);
trainset.data = trainset.data:double() -- convert the data from a ByteTensor to a DoubleTensor.
function trainset:size()
return self.data:size(1)
end
mean = {} -- store the mean, to normalize the test set in the future
stdv = {} -- store the standard-deviation for the future
for i=1,3 do -- over each image channel
mean[i] = trainset.data[{ {}, {i}, {}, {} }]:mean() -- mean estimation
print('Channel ' .. i .. ', Mean: ' .. mean[i])
trainset.data[{ {}, {i}, {}, {} }]:add(-mean[i]) -- mean subtraction
stdv[i] = trainset.data[{ {}, {i}, {}, {} }]:std() -- std estimation
print('Channel ' .. i .. ', Standard Deviation: ' .. stdv[i])
trainset.data[{ {}, {i}, {}, {} }]:div(stdv[i]) -- std scaling
end
trainset.data = trainset.data:cuda()
trainset.label = trainset.label:cuda()
modele = nn.GapBranch(trainset.label):cuda()
modele.gapbranch = modele.gapbranch:cuda()
print(modele.gapbranch)
theta, gradTheta = modele.gapbranch:getParameters()
optimState = {learningRate = 0.15}
require 'optim'
for epoch = 1, 1 do
function feval(theta)
for i=1, 1 do
modele.gapbranch:zeroGradParameters()
m = nn.Unsqueeze(1)
m = m:cuda()
n = m:forward(trainset.data[i])
h = modele:forward(n)
j = modele:get_loss(h,trainset.label[i])
loss_cls_grad = modele.loss_cross_entropy:backward(h[1],trainset.label[i])
loss_cls_ers_grad = modele.loss_cross_entropy:backward(h[2],trainset.label[i])
loss_grad = loss_cls_grad + loss_cls_ers_grad
loss_grad = torch.randn(1,10,32,32):cuda()
modele.gapbranch:backward(n, loss_grad)
end
return j, gradTheta
end
print('***************************')
optim.sgd(feval, theta, optimState)
end
If anyone could help i would be very grateful
In Machine Translation Dataset I have successfully pre-trained my model in Lua. Now I move to train my model.
But I get the error in a Lua file in the function rembuff:floor()
Error: Attempt to call method 'floor' (a nil value)
This is that specific function :
function MarginBatchBeamSearcher:nextSearchStep(t, batch_pred_inp, batch_ctx, beam_dec, beam_scorer,gold_scores, target, target_w, gold_rnn_state_dec, delts, losses, global_noise)
local K = self.K
local resval, resind, rembuff = self.resval, self.resind, self.rembuff
local finalval, finalind = self.finalval, self.finalind
self:synchDropout(t, global_noise)
-- pred_inp should be what was predicted at the last step
local outs = beam_dec:forward({batch_pred_inp, batch_ctx, unpack(self.prev_state)})
local all_scores = beam_scorer:forward(outs[#outs]) -- should be (batch_l*K) x V matrix
local V = all_scores:size(2)
local mistaken_preds = {}
for n = 1, self.batch_size do
delts[n] = 0
losses[n] = 0
if t <= target_w[n]-1 then -- only do things if t <= length (incl end token) - 2
local beam_size = #self.pred_pfxs[n]
local nstart = (n-1)*K+1
local nend = n*K
local scores = all_scores:sub(nstart, nstart+beam_size-1):view(-1) -- scores for this example
-- take top K
torch.topk(resval, resind, scores, K, 1, true)
-- see if we violated margin
torch.min(finalval, finalind, resval, 1) -- resind[finalind[1]] is idx of K'th highest predicted word
-- checking that true score at least 1 higher than K'th
losses[n] = math.max(0, 1 - gold_scores[n][target[t+1][n]] + finalval[1])
-- losses[n] = math.max(0, - gold_scores[n][target[t+1][n]] + finalval[1])
if losses[n] > 0 then
local parent_idx = math.ceil(resind[finalind[1]]/V)
local pred_word = ((resind[finalind[1]]-1)%V) + 1
mistaken_preds[n] = {prev = self.pred_pfxs[n][parent_idx], val = pred_word}
delts[n] = 1 -- can change.....
else
-- put predicted next words in pred_inp
rembuff:add(resind, -1) -- set rembuff = resind - 1
rembuff:div(V)
--if rembuff.floor then
rembuff:floor()
I am unable to rectify this error :
Please help !
require 'torch';
require 'nn';
require 'nnx';
mnist = require 'mnist';
fullset = mnist.traindataset()
testset = mnist.testdataset()
trainset = {
size = 50000,
data = fullset.data[{{1,50000}}]:double(),
label = fullset.label[{{1,50000}}]
}
validationset = {
size = 10000,
data = fullset.data[{{50001, 60000}}]:double(),
label = fullset.label[{{50001,60000}}]
}
-- MNIST Dataset has 28x28 images
model = nn.Sequential()
model:add(nn.SpatialConvolutionMM(1, 32, 5, 5)) -- 32x24x24
model:add(nn.ReLU())
model:add(nn.SpatialMaxPooling(3, 3, 3, 3)) -- 32x8x8
model:add(nn.SpatialConvolutionMM(32, 64, 5, 5)) -- 64x4x4
model:add(nn.Tanh())
model:add(nn.SpatialMaxPooling(2, 2, 2, 2)) -- 64x2x2
model:add(nn.Reshape(64*2*2))
model:add(nn.Linear(64*2*2, 200))
model:add(nn.Tanh())
model:add(nn.Linear(200, 10))
model:add(nn.LogSoftMax())
criterion = nn.ClassNLLCriterion()
x, dldx = model:getParameters() -- now x stores the trainable parameters and dldx stores the gradient wrt these params in the model above
sgd_params = {
learningRate = 1e-2,
learningRateDecay = 1e-4,
weightDecay = 1e-3,
momentum = 1e-4
}
step = function ( batchsize )
-- setting up variables
local count = 0
local current_loss = 0
local shuffle = torch.randperm(trainset.size)
-- setting default batchsize as 200
batchsize = batchsize or 200
-- setting inputs and targets for minibatches
for minibatch_number = 1, trainset.size, batchsize do
local size = math.min( trainset.size - minibatch_number + 1, batchsize )
local inputs = torch.Tensor(size, 28, 28)
local targets = torch.Tensor(size)
for index = 1, size do
inputs[index] = trainset.data[ shuffle[ index + minibatch_number ]]
targets[index] = trainset.label[ shuffle[ index + minibatch_number ] ]
end
-- defining feval function to return loss and gradients of loss w.r.t. params
feval = function( x_new )
--print ( "---------------------------------safe--------------------")
if x ~= x_new then x:copy(x_new) end
-- initializing gradParsams to zero
dldx:zero()
-- calculating loss and param gradients
local loss = criterion:forward( model.forward( inputs ), targets )
model:backward( inputs, criterion:backward( model.output, targets ) )
return loss, dldx
end
-- getting loss
-- optim returns x*, {fx} where x* is new set of params and {fx} is { loss } => fs[ 1 ] carries loss from feval
print(feval ~= nil and x ~= nil and sgd_params ~= nil)
_,fs = optim.sgd(feval, x, sgd_params)
count = count + 1
current_loss = current_loss + fs[ 1 ]
end
--returning avg loss over the minibatch
return current_loss / count
end
max_iters = 30
for i = 1 ,max_iters do
local loss = step()
print(string.format('Epoch: %d Current loss: %4f', i, loss))
end
I am new to torch and lua and I'm not able to find an error in the above code. Can anyone suggest a way to debug it?
The error:
/home/afroz/torch/install/bin/luajit: /home/afroz/test.lua:88: attempt to index global 'optim' (a nil value)
stack traceback:
/home/afroz/test.lua:88: in function 'step'
/home/afroz/test.lua:102: in main chunk
[C]: in function 'dofile'
...froz/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
optim is not defined in the scope of your script. You try to call optim.sgd which of course results in the error you see.
Like nn, optim is a extension package to torch.
require 'torch';
require 'nn';
require 'nnx';
Remember those lines in the beginning of your script? They basically execute the definition of those packages.
Make sure optim is installed, then try to require it.
https://github.com/torch/optim
optim is not assigned anywhere in the script, so when the script references optim.sgd, its value is nil and you get the error you shown. You need to doublecheck the script to make sure the optim is assigned the correct value.
I'm trying to train a simple test network on the XOR function in Torch. It works when I use MSECriterion, but when I try CrossEntropyCriterion it fails with the following error message:
/home/a/torch/install/bin/luajit: /home/a/torch/install/share/lua/5.1/nn/THNN.lua:699: Assertion `cur_target >= 0 && cur_target < n_classes' failed. at /tmp/luarocks_nn-scm-1-6937/nn/lib/THNN/generic/ClassNLLCriterion.c:31
stack traceback:
[C]: in function 'v'
/home/a/torch/install/share/lua/5.1/nn/THNN.lua:699: in function 'ClassNLLCriterion_updateOutput'
...e/a/torch/install/share/lua/5.1/nn/ClassNLLCriterion.lua:41: in function 'updateOutput'
...torch/install/share/lua/5.1/nn/CrossEntropyCriterion.lua:13: in function 'forward'
.../a/torch/install/share/lua/5.1/nn/StochasticGradient.lua:35: in function 'train'
a.lua:34: in main chunk
[C]: in function 'dofile'
/home/a/torch/install/lib/luarocks/rocks/trepl/scm-1/bin/th:145: in main chunk
[C]: at 0x00406670
I get the same error message when decomposing it into LogSoftMax and ClassNLLCriterion. Code is:
dataset={};
function dataset:size() return 100 end -- 100 examples
for i=1,dataset:size() do
local input = torch.randn(2); -- normally distributed example in 2d
local output = torch.Tensor(2);
if input[1]<0 then
input[1]=-1
else
input[1]=1
end
if input[2]<0 then
input[2]=-1
else
input[2]=1
end
if input[1]*input[2]>0 then -- calculate label for XOR function
output[2] = 1;
else
output[1] = 1
end
dataset[i] = {input, output}
end
require "nn"
mlp = nn.Sequential(); -- make a multi-layer perceptron
inputs = 2; outputs = 2; HUs = 20; -- parameters
mlp:add(nn.Linear(inputs, HUs))
mlp:add(nn.Tanh())
mlp:add(nn.Linear(HUs, outputs))
criterion = nn.CrossEntropyCriterion()
trainer = nn.StochasticGradient(mlp, criterion)
trainer.learningRate = 0.01
trainer:train(dataset)
x = torch.Tensor(2)
x[1] = 1; x[2] = 1; print(mlp:forward(x))
x[1] = 1; x[2] = -1; print(mlp:forward(x))
x[1] = -1; x[2] = 1; print(mlp:forward(x))
x[1] = -1; x[2] = -1; print(mlp:forward(x))
MSE criterion was designed for regression problems. When it's used for classification tasks, the targets should be one-hot vectors. Cross entropy / Negative log likelihood criteria are used exclusively for classification; therefore, there's no need to explicitly represent the target class as a vector. In torch the target for such criteria is just an index of the assigned class (1 to the number of classes).