Classification with torch model exported from digits - lua 5.1 - lua

I'm very new to deep learning and i'm trying to obtain a classification with lua.
I've installed digits with torch and lua 5.1 and i've train the following model :
After that, i've made a classification with the digits server to test the exemple and here is the result :
I've exported the model and now i'm trying to do a classification with the following lua code :
local image_url = '/home/delpech/mnist/test/5/04131.png'
local network_url = '/home/delpech/models/snapshot_30_Model.t7'
local network_name = paths.basename(network_url)
print '==> Loading network'
local net = torch.load(network_name)
--local net = torch.load(network_name):unpack():float()
net:evaluate()
print(net)
print '==> Loading synsets'
print 'Loads mapping from net outputs to human readable labels'
local synset_words = {}
--for line in io.lines'/home/delpech/models/labels.txt' do table.insert(synset_words, line:sub(11)) end
for line in io.lines'/home/delpech/models/labels.txt' do table.insert(synset_words, line) end
print 'synset words'
for line in io.lines'/home/delpech/models/labels.txt' do print(line) end
print '==> Loading image and imagenet mean'
local im = image.load(image_url)
print '==> Preprocessing'
local I = image.scale(im,28,28,'bilinear'):float()
print 'Propagate through the network, sort outputs in decreasing order and show 10 best classes'
local _,classes = net:forward(I):view(-1):sort(true)
for i=1,10 do
print('predicted class '..tostring(i)..': ', synset_words[classes[i]])
end
But here is the output :
delpech#delpech-K55VD:~/models$ lua classify.lua
==> Downloading image and network
==> Loading network
nn.Sequential {
[input -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> output]
(1): nn.MulConstant
(2): nn.SpatialConvolution(1 -> 20, 5x5)
(3): nn.SpatialMaxPooling(2x2, 2,2)
(4): nn.SpatialConvolution(20 -> 50, 5x5)
(5): nn.SpatialMaxPooling(2x2, 2,2)
(6): nn.View(-1)
(7): nn.Linear(800 -> 500)
(8): nn.ReLU
(9): nn.Linear(500 -> 10)
(10): nn.LogSoftMax
}
==> Loading synsets
Loads mapping from net outputs to human readable labels
synset words
0
1
2
3
4
5
6
7
8
9
==> Loading image and imagenet mean
==> Preprocessing
Propagate through the network, sort outputs in decreasing order and show 5 best classes
predicted class 1: 4
predicted class 2: 8
predicted class 3: 0
predicted class 4: 1
predicted class 5: 9
predicted class 6: 6
predicted class 7: 7
predicted class 8: 2
predicted class 9: 5
predicted class 10: 3
And this is actually not the classification provided by digits...

OK, after searching in the digits code source, it looked like i've missed two things :
you have to get the mean image in the job folder and make the following pre-process :
print '==> Preprocessing'
for i=1,im_mean:size(1) do
im[i]:csub(im_mean[i])
end
and the fact that i had to load my images in this way and multiply every pixel to 255.
local im = image.load(image_url):type('torch.FloatTensor'):contiguous();
im:mul(255)
Here is the total anwser :
require 'image'
require 'nn'
require 'torch'
require 'paths'
local function main()
print '==> Downloading image and network'
local image_url = '/home/delpech/mnist/test/7/03079.png'
local network_url = '/home/delpech/models/snapshot_30_Model.t7'
local mean_url = '/home/delpech/models/mean.jpg'
print '==> Loading network'
local net = torch.load(network_url)
net:evaluate();
print '==> Loading synsets'
print 'Loads mapping from net outputs to human readable labels'
local synset_words = {}
for line in io.lines'/home/delpech/models/labels.txt' do table.insert(synset_words, line) end
print '==> Loading image and imagenet mean'
local im = image.load(image_url):type('torch.FloatTensor'):contiguous();--:contiguous()
im:mul(255)
local I = image.scale(im,28,28,'bilinear'):float()
local im_mean = image.load(mean_url):type('torch.FloatTensor'):contiguous();
im_mean:mul(255)
local Imean = image.scale(im,28,28,'bilinear'):float()
print '==> Preprocessing'
for i=1,im_mean:size(1) do
im[i]:csub(im_mean[i])
end
local _,classes = net:forward(im):sort(true);
for i=1,10 do
print('predicted class '..tostring(i)..': ', synset_words[classes[i]])
end
end
main()

Related

How does one create a distributed data loader with PyTorch's TorchMeta for meta-learning?

I was trying to create a pytorch distributed data laoder with torchmeta but it failed with a deadlock:
python ~/ultimate-utils/tutorials_for_myself/my_torchmeta/torchmeta_ddp.py
test_basic_ddp_example
ABOUT TO SPAWN WORKERS (via mp.spawn)
-> started ps with rank=0
-> rank=0
-> mp.current_process()=<SpawnProcess name='SpawnProcess-1' parent=54167 started>
-> os.getpid()=54171
device=device(type='cpu')
----> setting up rank=0 (with world_size=4)
---> MASTER_ADDR='127.0.0.1'
---> 57813
---> backend='gloo'
-> started ps with rank=2
-> rank=2
-> mp.current_process()=<SpawnProcess name='SpawnProcess-3' parent=54167 started>
-> os.getpid()=54173
device=device(type='cpu')
----> setting up rank=2 (with world_size=4)
---> MASTER_ADDR='127.0.0.1'
---> 57813
---> backend='gloo'
-> started ps with rank=1
-> rank=1
-> mp.current_process()=<SpawnProcess name='SpawnProcess-2' parent=54167 started>
-> os.getpid()=54172
device=device(type='cpu')
----> setting up rank=1 (with world_size=4)
---> MASTER_ADDR='127.0.0.1'
---> 57813
---> backend='gloo'
-> started ps with rank=3
-> rank=3
-> mp.current_process()=<SpawnProcess name='SpawnProcess-4' parent=54167 started>
-> os.getpid()=54174
device=device(type='cpu')
----> setting up rank=3 (with world_size=4)
---> MASTER_ADDR='127.0.0.1'
---> 57813
---> backend='gloo'
[W ProcessGroupGloo.cpp:684] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W ProcessGroupGloo.cpp:684] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W ProcessGroupGloo.cpp:684] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
[W ProcessGroupGloo.cpp:684] Warning: Unable to resolve hostname to a (local) address. Using the loopback address as fallback. Manually set the network interface to bind to with GLOO_SOCKET_IFNAME. (function operator())
----> done setting up rank=0
----> done setting up rank=2
----> done setting up rank=3
----> done setting up rank=1
about to create model
about to create model
about to create model
about to create model
done creating ddp model
about to create torch meta data loader
about to get datasets
here
done creating ddp model
about to create torch meta data loader
about to get datasets
done creating ddp model
about to create torch meta data loader
done creating ddp model
about to create torch meta data loader
about to get datasets
about to get datasets
here
here
here
why does this happen?
whole code: https://github.com/brando90/ultimate-utils/blob/master/tutorials_for_myself/my_torchmeta/torchmeta_ddp.py
ref: https://github.com/tristandeleu/pytorch-meta/issues/116
#%%
"""
test a basic DDP example
"""
from argparse import Namespace
import torch
from torch import nn
import torch.multiprocessing as mp
from torch.utils.data import DataLoader
# from meta_learning.base_models.learner_from_opt_as_few_shot_paper import get_learner_from_args
from uutils.torch_uu.models.learner_from_opt_as_few_shot_paper import get_learner_from_args
from uutils.torch_uu import process_meta_batch
from uutils.torch_uu.dataloaders import get_distributed_dataloader_miniimagenet_torchmeta, get_args_for_mini_imagenet
from uutils.torch_uu.distributed import print_process_info, print_gpu_info, setup_process, move_model_to_ddp, \
cleanup, find_free_port
def get_dist_dataloader_torch_meta_mini_imagenet(args) -> dict[str, DataLoader]:
dataloaders: dict[str, DataLoader] = get_distributed_dataloader_miniimagenet_torchmeta(args)
return dataloaders
def run_parallel_training_loop(rank: int, args: Namespace):
"""
Run torchmeta examples with a distributed dataloader.
This should distribute the following loop:
for batch_idx, batch in enumerate(dataloader['train']):
print(f'{batch_idx=}')
spt_x, spt_y, qry_x, qry_y = process_meta_batch(args, batch)
print(f'Train inputs shape: {spt_x.size()}') # (2, 25, 3, 28, 28)
print(f'Train targets shape: {spt_y.size()}'.format(spt_y.shape)) # (2, 25)
print(f'Test inputs shape: {qry_x.size()}') # (2, 75, 3, 28, 28)
print(f'Test targets shape: {qry_y.size()}') # (2, 75)
break
Note:
usual loop for ddp looks as follows:
for i, batch in enumerate(train_loader):
# Forward pass
outputs = model(images)
loss = criterion(outputs, labels)
if rank == 0:
print(f'{loss=}')
# Backward and optimize
optimizer.zero_grad()
loss.backward() # When the backward() returns, param.grad already contains the synchronized gradient tensor.
optimizer.step()
"""
print(f'-> started ps with {rank=}')
args.rank = rank
print_process_info(args.rank)
print_gpu_info()
args.gpu = rank
setup_process(args, rank, master_port=args.master_port, world_size=args.world_size)
# get ddp model
print('about to create model')
# args.Din, args.Dout = 10, 10
# model = nn.Linear(args.Din, args.Dout)
model = get_learner_from_args(args)
model = move_model_to_ddp(rank, args, model)
criterion = nn.CrossEntropyLoss().to(args.gpu)
print('done creating ddp model')
# can distributed dataloader
print('about to create torch meta data loader')
dataloaders: dict[str, DataLoader] = get_distributed_dataloader_miniimagenet_torchmeta(args)
print('done created distributed data loaders')
optimizer = torch.optim.SGD(model.parameters(), 1e-4)
# do training
print('about to train')
for batch_idx, batch in enumerate(dataloaders['train']):
print(f'{batch_idx=}')
spt_x, spt_y, qry_x, qry_y = process_meta_batch(args, batch)
outputs = model(spt_x)
loss = criterion(outputs, spt_y)
if rank == 0:
print(f'{loss=}')
# Backward and optimize
optimizer.zero_grad()
loss.backward() # When the backward() returns, param.grad already contains the synchronized gradient tensor.
optimizer.step()
# Destroy a given process group, and deinitialize the distributed package
cleanup(rank)
def hello(rank: int, args):
print(f'hello {rank=}')
def ddp_example_torchmeta_dataloader_test():
"""
Useful links:
- https://github.com/yangkky/distributed_tutorial/blob/master/src/mnist-distributed.py
- https://pytorch.org/tutorials/intermediate/ddp_tutorial.html
"""
print('test_basic_ddp_example')
# args = Namespace(epochs=3, batch_size=8)
args = get_args_for_mini_imagenet()
if torch.cuda.is_available():
args.world_size = torch.cuda.device_count()
else:
args.world_size = 4
args.master_port = find_free_port()
print('\nABOUT TO SPAWN WORKERS (via mp.spawn)')
# mp.spawn(hello, args=(args,), nprocs=args.world_size)
mp.spawn(run_parallel_training_loop, args=(args,), nprocs=args.world_size)
print('mp.spawn finished\a')
if __name__ == '__main__':
print('')
ddp_example_torchmeta_dataloader_test()
print('Done\a')

TfidfVectorizer returns empty elements

I am currently doing sentiment analysis project. I fit the vectorizer with my train data in dataframe format. Then I transform the test data with the same vectorizer but it returns nothing for me. I did check the TfidfVectorizer.get_feature_names() and the desired transform word already exist inside the features. What is wrong with my vectorizer?
Code:
vectorizer = TfidfVectorizer(analyzer=lambda x: x)
x = vectorizer.fit_transform(data['clean_text'])
print(vectorizer.get_feature_names()[9427])
# output sad
print(vectorizer.transform(["sad"]))
# empty result
print(vectorizer.transform(["sad"]).toarray())
# return a whole 0 array
Sample data format (dataframe)
sentiment clean_text
0 0 [respond, go]
1 1 [sooo, sad]
2 1 [bulli]
3 1 [leav, alon]
4 1 [cry]

How do I share weights across Parallel-streams?

Is there a way to share weights across parallel streams of a torch-model?
For example, I have the following model.
mlp = nn.Sequential();
c = nn.Parallel(1,2) -- Parallel container will associate a module to each slice of dimension 1
-- (row space), and concatenate the outputs over the 2nd dimension.
for i=1,10 do -- Add 10 Linear+Reshape modules in parallel (input = 3, output = 2x1)
local t=nn.Sequential()
t:add(nn.Linear(3,2)) -- Linear module (input = 3, output = 2)
t:add(nn.Reshape(2,1)) -- Reshape 1D Tensor of size 2 to 2D Tensor of size 2x1
c:add(t)
end
mlp:add(c)
And now I want to share the weight (including everything, weights, bias, gradients), of the nn.Linear layer above across different numbers of i (so, e.g. nn.Linear(3,2)[1] with nn.Linear(3,2)[9]). What options do I have to share those?
Or is it rather recommended to use a different container/the module-approach?
You can create the module that will be repeated:
t = nn.Sequential()
t:add(nn.Linear(3,2))
t:add(nn.Reshape(2,1))
Then you can use the clone function of torch with additional parameters to share the weights (https://github.com/torch/nn/blob/master/doc/module.md#clonemlp)
mlp = nn.Sequential()
c = nn.Parallel(1,2)
for i = 1, 10 do
c:add(t:clone('weight', 'bias'))
end
mlp:add(c)

Torch / Lua, which neural network structure for mini-batch training?

I'm still working on implementing the mini-batch gradient update on my siamese neural network. Previously I had an implementation problem, that was correctly solved here.
Now I realized that there was also a mistake in the architecture of my neural network, that is related to my incomplete understanding of the correct implementation.
So far, I've always used a non-minibatch gradient descent approach, in which I was passing the training elements one by one to the gradient update. Now, I want to implement a gradient update through mini-batch, starting say with minibatches made of N=2 elements.
My question is: how should I change the architecture of my siamese neural network to make it able to handle a mini-batch of N=2 elements instead of a single element?
This is the (simplified) architecture of my siamese neural network:
nn.Sequential {
[input -> (1) -> (2) -> output]
(1): nn.ParallelTable {
input
|`-> (1): nn.Sequential {
| [input -> (1) -> (2) -> output]
| (1): nn.Linear(6 -> 3)
| (2): nn.Linear(3 -> 2)
| }
|`-> (2): nn.Sequential {
| [input -> (1) -> (2) -> output]
| (1): nn.Linear(6 -> 3)
| (2): nn.Linear(3 -> 2)
| }
... -> output
}
(2): nn.CosineDistance
}
I have:
2 identical siamese neural networks (upper and lower)
6 input units
3 hidden units
2 output units
cosine distance function that compares the output of the two parallel neural networks
Here's my code:
perceptronUpper= nn.Sequential()
perceptronUpper:add(nn.Linear(input_number, hiddenUnits))
perceptronUpper:add(nn.Linear(hiddenUnits,output_number))
perceptronLower= perceptronUpper:clone('weight', 'gradWeights', 'gradBias',
'bias')
parallel_table = nn.ParallelTable()
parallel_table:add(perceptronUpper)
parallel_table:add(perceptronLower)
perceptron = nn.Sequential()
perceptron:add(parallel_table)
perceptron:add(nn.CosineDistance())
This architecture works very well if I have a gradient update function that takes 1 element; how should modify it to let it manage a minibatch?
EDIT: I probably should use the nn.Sequencer() class, by modifying the last two lines of my code in:
perceptron:add(nn.Sequencer(parallel_table))
perceptron:add(nn.Sequencer(nn.CosineDistance())).
What do you guys think?
Every nn module can work with minibatches. Some work only with minibatches, e.g. (Spatial)BatchNormalization. A module knows how many dimensions its input must contain (let's say D) and if the module receives a D+1 dimensional tensor, it assumes the first dimension to be the batch dimension. For example, take a look at nn.Linear module documentation:
The input tensor given in forward(input) must be either a vector (1D
tensor) or matrix (2D tensor). If the input is a matrix, then each row
is assumed to be an input sample of given batch.
function table_of_tensors_to_batch(tbl)
local batch = torch.Tensor(#tbl, unpack(tbl[1]:size():totable()))
for i = 1, #tbl do
batch[i] = tbl[i]
end
return batch
end
inputs = {
torch.Tensor(5):fill(1),
torch.Tensor(5):fill(2),
torch.Tensor(5):fill(3),
}
input_batch = table_of_tensors_to_batch(inputs)
linear = nn.Linear(5, 2)
output_batch = linear:forward(input_batch)
print(input_batch)
1 1 1 1 1
2 2 2 2 2
3 3 3 3 3
[torch.DoubleTensor of size 3x5]
print(output_batch)
0,3128 -1,1384
0,7382 -2,1815
1,1637 -3,2247
[torch.DoubleTensor of size 3x2]
Ok, but what about containers (nn.Sequential, nn.Paralel, nn.ParallelTable and others)? Container itself does not deal with an input, it just sends the input (or its corresponding part) to the corresponding module it contains. ParallelTable, for example, simply applies the i-th member module to the i-th input table element. Thus, if you want it to handle a batch, each input[i] (input is a table) must be a tensor with the batch dimension as described above.
input_number = 5
output_number = 2
inputs1 = {
torch.Tensor(5):fill(1),
torch.Tensor(5):fill(2),
torch.Tensor(5):fill(3),
}
inputs2 = {
torch.Tensor(5):fill(4),
torch.Tensor(5):fill(5),
torch.Tensor(5):fill(6),
}
input1_batch = table_of_tensors_to_batch(inputs1)
input2_batch = table_of_tensors_to_batch(inputs2)
input_batch = {input1_batch, input2_batch}
output_batch = perceptron:forward(input_batch)
print(input_batch)
{
1 : DoubleTensor - size: 3x5
2 : DoubleTensor - size: 3x5
}
print(output_batch)
0,6490
0,9757
0,9947
[torch.DoubleTensor of size 3]
target_batch = torch.Tensor({1, 0, 1})
criterion = nn.MSECriterion()
err = criterion:forward(output_batch, target_batch)
gradCriterion = criterion:backward(output_batch, target_batch)
perceptron:zeroGradParameters()
perceptron:backward(input_batch, gradCriterion)
Why is there nn.Sequencer then? Can one use it instead? Yes, but it's highly not recommended. Sequencer takes a sequence table and applies the module to each element in the table independently providing no speedup. Besides, it has to make copies of that module, so such "batch mode" is considerably less efficient than online (non-batch) training. Sequencer was designed to be a part of recurrent nets, no point to using it in your case.

Expecting a contiguous tensor error with nn.Sum

I have a 2x16x3x10x10 tensor that I feed into my network. My network has two parts that work in parallel. The first part takes the 16x3x10x10 matrix and computes the sum over the last two dimensions, returning a 16x3 tensor.
The second part is a convolutional neural network that produces a 16x160 tensor.
Whenever I try to run this model, I get the following error:
...903/nTorch/Torch7/install/share/lua/5.1/torch/Tensor.lua:457: expecting a contiguous tensor
stack traceback:
[C]: in function 'assert'
...903/nTorch/Torch7/install/share/lua/5.1/torch/Tensor.lua:457: in function 'view'
...8/osu7903/nTorch/Torch7/install/share/lua/5.1/nn/Sum.lua:26: in function 'updateGradInput'
...03/nTorch/Torch7/install/share/lua/5.1/nn/Sequential.lua:40: in function 'updateGradInput'
...7903/nTorch/Torch7/install/share/lua/5.1/nn/Parallel.lua:52: in function 'updateGradInput'
...su7903/nTorch/Torch7/install/share/lua/5.1/nn/Module.lua:30: in function 'backward'
...03/nTorch/Torch7/install/share/lua/5.1/nn/Sequential.lua:73: in function 'backward'
./train_v2_with_batch.lua:144: in function 'opfunc'
...su7903/nTorch/Torch7/install/share/lua/5.1/optim/sgd.lua:43: in function 'sgd'
./train_v2_with_batch.lua:160: in function 'train'
run.lua:93: in main chunk
[C]: in function 'dofile'
...rch/Torch7/install/lib/luarocks/rocks/trepl/scm-1/bin/th:131: in main chunk
[C]: at 0x00405800
Here is the relevant part of the model:
local first_part = nn.Parallel(1,2)
local CNN = nn.Sequential()
local sums = nn.Sequential()
sums:add(nn.Sum(3))
sums:add(nn.Sum(3))
first_part:add(sums)
-- stage 1: conv+max
CNN:add(nn.SpatialConvolutionMM(nfeats, convDepth_L1,receptiveFieldWidth_L1,receptiveFieldHeight_L1))
-- Since the default stride of the receptive field is 1, then
-- (assuming receptiveFieldWidth_L1 = receptiveFieldHeight_L1 = 3) the number of receptive fields is (10-3+1)x(10-3+1) or 8x8
-- so the output volume is (convDepth_L1 X 8 X 8) or 10 x 8 x 8
--CNN:add(nn.Threshold())
CNN:add(nn.ReLU())
CNN:add(nn.SpatialMaxPooling(poolsize,poolsize,poolsize,poolsize))
-- if poolsize=2, then the output of this is 10x4x4
CNN:add(nn.Reshape(convDepth_L1*outputWdith_L2*outputWdith_L2,true))
first_part:add(CNN)
The code works when the input tensor is 2x1x3x10x10, but not when the tensor is 2x16x3x10x10.
Edit: I only just realized that this happens when I do model:backward and not model:forward. Here is the relevant code:
local y = model:forward(x)
local E = loss:forward(y,yt)
-- estimate df/dW
local dE_dy = loss:backward(y,yt)
print(dE_dy)
model:backward(x,dE_dy)
x is a 2x16x3x10x10 tensor and dE_dy is 16x2.
This is a flaw in torch.nn library. To perform a backward step, nn.Parallel splits gradOutput it receives from higher module into pieces and sends them to its parallel submodules. Splitting are done effectively without copying memory, and thus those pieces are non-contiguous (unless you split on the 1st dimension).
local first_part = nn.Parallel(1,2)
-- ^
-- Merging on the 2nd dimension;
-- Chunks of splitted gradOutput will not be contiguous
The problem is that nn.Sum cannot work with non-contiguous gradOutput. I haven't got a better idea than to make changes to it:
Sum_nc, _ = torch.class('nn.Sum_nc', 'nn.Sum')
function Sum_nc:updateGradInput(input, gradOutput)
local size = input:size()
size[self.dimension] = 1
-- modified code:
if gradOutput:isContiguous() then
gradOutput = gradOutput:view(size) -- doesn't work with non-contiguous tensors
else
gradOutput = gradOutput:resize(size) -- slower because of memory reallocation and changes gradOutput
-- gradOutput = gradOutput:clone():resize(size) -- doesn't change gradOutput; safer and even slower
end
--
self.gradInput:resizeAs(input)
self.gradInput:copy(gradOutput:expandAs(input))
return self.gradInput
end
[...]
sums = nn.Sequential()
sums:add(nn.Sum_nc(3)) -- <- will use torch.view
sums:add(nn.Sum_nc(3)) -- <- will use torch.resize

Resources