I am trying to rebuild the 3D U-Net in this paper:
https://arxiv.org/pdf/1606.06650.pdf
And unfortunately, when I get to the first merge, I get the following error from Keras:
ValueError: "concat" mode can only merge layers with matching output shapes except for the concat axis. Layer shapes: [(None, 512, 14, 8, 10), (None, 256, 15, 8, 10)]
I understand that based on this thread:
https://github.com/fchollet/keras/issues/633
The following is true:
the concat axis is the axis along which to concatenate the two tensors.
Lets say you have two three-dimensional tensors of shape (2,3,5) and (2,3,7). Then you can only concatenate them along the third (zero based index: 2) axis, because then – figuratively – the two "faces" of the "cuboid" that you "glue together" are each 2-by-3 and only those fit. So you need to set concat_axis = 2 (or -1, since it is the last one) resulting in a new tensor (2,3,12).
Typically in a NN you would merge along the axis of the features, which depends on the type of layers you use and the implementation in keras. If you are not sure you can try out a few, most likely only one will work for the reason given above. If the figurative "faces don't fit" you will get an error message like the one in my opening post."
Which means I should be merging on the 14 and 15, which are axis=0, correct?
Can someone help explain what I am missing in this setup?
Thanks!
Related
I have trained a torch model for NLP tasks and would like to perform some inference using a multi GPU machine (in this case with two GPUs).
Inside the processing code, I use this
dataset = TensorDataset(encoded_dict['input_ids'], encoded_dict['attention_mask'])
sampler = DistributedSampler(
dataset, num_replicas=args.nodes * args.gpus, rank=args.node_rank * args.gpus + gpu_number, shuffle=False
)
dataloader = DataLoader(dataset, batch_size=batch_size, sampler=sampler)
For those familiar with NLP, encoded_dict is the output from the tokenizer.batch_encode_plus function where the tokenizer is an instance of transformers.BertTokenizer.
The issue I’m having is that when I call the code through the torch.multiprocessing.spawn function, each GPU is doing predictions (i.e. inference) on a subset of the full dataset, and saving the predictions separately; for example, if I have a dataset with 1000 samples to predict, each GPU is predicting 500 of them. As a result, I have no way of knowing which samples out of the 1000 were predicted by which GPU, as their order is not preserved, therefore the model predictions are meaningless as I cannot trace each of them back to their input sample.
I have tried to save the dataloader instance (as a pickle) together with the predictions and then extracting the input_ids by using dataloader.dataset.tensors, however this requires a tokeniser decoding step which I rather avoid, as the tokenizer will have slightly changed the text (for example double whitespaces would be removed, words with dashes will have been split and so on).
What is the cleanest way to save the input text samples together with their predictions when doing inference in distributed mode, or alternatively to keep track of which prediction refers to which sample?
As I understand it, basically your dataset returns for an index idx [data,label] during training and [data] during inference. The issue with this is that the idx is not preserved by the dataloader object, so there is no way to obtain the idx values for the minibatch after the fact.
One way to handle this issue is to define a very simple custom dataset object that also returns [data,id] instead of only data during inference. Probably the easiest way to do this is to make the dataset return a dictionary object with keys id and data. The dictionary return type is convenient because Pytorch collates (converts data structures to batches) this type automatically, otherwise you'd have to define a custom collate_fn and pass it to the dataloader object, which is itself not very hard but is an extra step.
In any case, here's I would define a new dataset object as follows which should be almost a one-to-one substitute for your current dataset (I believe):
def TensorDictDataset(torch.data.Dataset):
def __init__(self,ids,attention_mask):
self.ids = ids
self.mask = attention_mask
def __len__(self):
return len(self.ids)
def __getitem(self,idx):
datum = {
"mask": self.mask[idx],
"id":ids[idx]
}
return datum
The only change you'll then have to make is that rather than returning mask your dataset will now return dict{"mask":mask,"id":id} so you'll have to parse that appropriately.
thanks for your answer. I have done further debugging and found another solution and wanted to post it.
Your solution is quite elegant (there was one minor misunderstanding, in that the predictions contain only the predicted labels and not the data contrary to what you understood, but this doesn't affect your answer anyway). Mask is NLP is also something else, and instead of having the mask tokens together with predictions I would like to have the untokenized text string. This is not so easy to achieve because the splitting of the data into different GPUs happens AFTER the tokenisation, however I believe that with a slight adaptation to your answer it could work.
However, I’ve done some further debugging and I’ve noticed that the data are not actually randomly split across GPUs as I thought. If I set shuffle=False in the DistributedSampler then this happens:
in the case of two GPUs, GPU 0 and GPU 1, all the samples with even index (starting from 0) will be passed to GPU 0, and all those with odd index will be passed to GPU 1.
So for example, if you have 10 samples, whose indices are [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], then samples 0, 2, 4, 6, 8 will go to GPU 0 and samples 1, 3, 5, 7, 9 will to go GPU 1. Therefore this allows me to map the predictions back to the original text string samples by just using this ordering. Not sure if this is the best solution, as keeping the original text string next to its prediction would be ideal, but at least it works.
N.B. Special case: As the two GPUs must be passed the SAME number of inputs, if the number of inputs is an odd number, for example we have 9 samples with indices [0, 1, 2, 3, 4, 5, 6, 7, 8], then GPU 0 will be passed samples 0, 2, 4, 6, 8 and GPU 1 will be passed samples 1, 3, 5, 7, 0 (in this exact order). In other words, the first sample with index 0 is repeated at the very end of the dataset to make sure each GPU has the same number of samples, in which case we can then write some codes which drops the last prediction from GPU 1 as it is redundant.
I have seen multiple posts on reshaping numpy arrays as inputs to CNN's however, I haven't been able to successfully reshape my array as an input to my CNN!
I have a CNN that merges with another model further downstream. The input shape of the CNN is (4,4,1) -- it is bigger but i have purposefully made it smaller to establish he pipeline and get it running before i put in the proper size.
the format will be the same however, its a 1 channel n x n np.array. I am getting errors when reshaping which I will mention after the code. The input dimensions are put in to the model as follows:
cnn_branch_input = tf.keras.layers.Input(shape=(4,4,1))
cnn_branch_two = tf.keras.layers.Conv2D(etc....)(cnn_branch_input)
the np array (which is originally a pandas dataframe) characteristics and reshaping are as follows:
np.array(array).shape
(4,4)
input = np.array(array).reshape(-1,1,4,4)
input.shape
(1,1,4,4)
the input to my merged model is as follows:
model.fit([cnn_input,gnn_input, gnn_node_feat], y,
#sample_weight=train_mask,
#validation_data=validation_data,
batch_size=4,
shuffle=False)
this causes an error which makes sense to me:
ValueError: Data cardinality is ambiguous:
x sizes: 1, 4, 4 -- Please provide data which shares the same first dimension.
So now when reshaping to intentionally have a 4x4 plus 1 channel shape as follows:
input = np.array(array).reshape(-1,4,4,1)
input.shape
(1,4,4,1)
Two things, the array reshapes to 4, 1x1 arrays, so it seems the structure of the original array is lost, and I get the same error!!
Notice that in both reshape methods, the shape is either (1,4,4,1) or (1,1,4,4).. the -1 entry simply becomes a 1, making the CNN think the first element is shape 1. I thought the -1 would allow me to successfully add the sample dimension as 'any number of samples'.
Simply entering the original (4,4) array, I receive the error that the CNN received a 2 dim array while a 4 dimension array is required.
Im really confused as to how to correctly reshape this array! I would appreciate any help!
Say, I have a 10x10x4 intermediate output of a convolution layer, which I need to split into 100 1x1x4 volume and apply softmax on each to get 100 outputs from the network. Is there any way to accomplish this without using the Lambda layer? The issue with the Lambda layer in this case is this simple task of splitting takes 100 passes through the lambda layer during forward pass, which makes the network performance very slow for my practical use. Please suggest a quicker way of doing this.
Edit: I had already tried the Softmax+Reshape approach before asking the question. With that approach, I would be getting a 10x10x4 matrix reshaped to a 100x4 Tensor with use of Reshape as the output. What I really need is a multi output network with 100 different outputs. In my application, it is not possible to jointly optimize over the 10x10 matrix, but I get good results by using a network with 100 different outputs with the Lambda layer.
Here are code snippets of my approach using the Keras functional API:
With Lambda layer (slow, gives 100 Tensors of shape (None, 4) as desired):
# Assume conv_output is output from a convolutional layer with shape (None, 10, 10,4)
preds = []
for i in range(10):
for j in range(10):
y = Lambda(lambda x, i,j: x[:, i, j,:], arguments={'i': i,'j':j})(conv_output)
preds.append(Activation('softmax',name='predictions_' + str(i*10+j))(y))
model = Model(inputs=img, outputs=preds, name='model')
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy']
With Softmax+Reshape (fast, but gives Tensor of shape (None, 100, 4))
# Assume conv_output is output from a convolutional layer with shape (None, 10, 10,4)
y = Softmax(name='softmax', axis=-1)(conv_output)
preds = Reshape([100, 4])(y)
model = Model(inputs=img, outputs=preds, name='model')
model.compile(loss='categorical_crossentropy',
optimizer=Adam(),
metrics=['accuracy']
I don't think in the second case it is possible to individually optimize over each of the 100 outputs (probably one can think of it as learning the joint distribution, whereas I need to learn the marginals as in the first case). Please let me know if there is any way to accomplish what I am doing with the Lambda layer in the first code snippet in a faster way
You can use the Softmax layer and set the axis argument to the last axis (i.e. -1) to apply softmax over that axis:
from keras.layers import Softmax
soft_out = Softmax(axis=-1)(conv_out)
Note that the axis argument by default is set to -1, so you may not even need to pass that.
Occasionally I see some models are using SpatialDropout1D instead of Dropout. For example, in the Part of speech tagging neural network, they use:
model = Sequential()
model.add(Embedding(s_vocabsize, EMBED_SIZE,
input_length=MAX_SEQLEN))
model.add(SpatialDropout1D(0.2)) ##This
model.add(GRU(HIDDEN_SIZE, dropout=0.2, recurrent_dropout=0.2))
model.add(RepeatVector(MAX_SEQLEN))
model.add(GRU(HIDDEN_SIZE, return_sequences=True))
model.add(TimeDistributed(Dense(t_vocabsize)))
model.add(Activation("softmax"))
According to Keras' documentation, it says:
This version performs the same function as Dropout, however it drops
entire 1D feature maps instead of individual elements.
However, I am unable to understand the meaning of entrie 1D feature. More specifically, I am unable to visualize SpatialDropout1D in the same model explained in quora.
Can someone explain this concept by using the same model as in quora?
Also, under what situation we will use SpatialDropout1D instead of Dropout?
To make it simple, I would first note that so-called feature maps (1D, 2D, etc.) is our regular channels. Let's look at examples:
Dropout(): Let's define 2D input: [[1, 1, 1], [2, 2, 2]]. Dropout will consider every element independently, and may result in something like [[1, 0, 1], [0, 2, 2]]
SpatialDropout1D(): In this case result will look like [[1, 0, 1], [2, 0, 2]]. Notice that 2nd element was zeroed along all channels.
The noise shape
In order to understand SpatialDropout1D, you should get used to the notion of the noise shape. In plain vanilla dropout, each element is kept or dropped independently. For example, if the tensor is [2, 2, 2], each of 8 elements can be zeroed out depending on random coin flip (with certain "heads" probability); in total, there will be 8 independent coin flips and any number of values may become zero, from 0 to 8.
Sometimes there is a need to do more than that. For example, one may need to drop the whole slice along 0 axis. The noise_shape in this case is [1, 2, 2] and the dropout involves only 4 independent random coin flips. The first component will either be kept together or be dropped together. The number of zeroed elements can be 0, 2, 4, 6 or 8. It cannot be 1 or 5.
Another way to view this is to imagine that input tensor is in fact [2, 2], but each value is double-precision (or multi-precision). Instead of dropping the bytes in the middle, the layer drops the full multi-byte value.
Why is it useful?
The example above is just for illustration and isn't common in real applications. More realistic example is this: shape(x) = [k, l, m, n] and noise_shape = [k, 1, 1, n]. In this case, each batch and channel component will be kept independently, but each row and column will be kept or not kept together. In other words, the whole [l, m] feature map will be either kept or dropped.
You may want to do this to account for adjacent pixels correlation, especially in the early convolutional layers. Effectively, you want to prevent co-adaptation of pixels with its neighbors across the feature maps, and make them learn as if no other feature maps exist. This is exactly what SpatialDropout2D is doing: it promotes independence between feature maps.
The SpatialDropout1D is very similar: given shape(x) = [k, l, m] it uses noise_shape = [k, 1, m] and drops entire 1-D feature maps.
Reference: Efficient Object Localization Using Convolutional Networks
by Jonathan Tompson at al.
I am building a machine learning model that is using data of tumors to classify other tumors. However, there seems to be a problem when I declare the cost.
I don't get why this is a problem, because I ran this same exact code with the MNIST data set provided by TensorFlow, and it worked fine. In that case, I had set my n_classes to 10, batch_size to 100, and x = tf.placeholder('float', [None, 784]).
You've specified the number of classes as 2:
n_classes = 2
So your output layer is shape [10, 2], when using a batch of 10 as you've specified. But you're passing 11 labels per sample, giving you a label shape of [10, 11]. You're probably passing your data in as your labels in your sess.run([...], feed_dict={...}). You didn't specify the shape of your labels:
y = tf.placeholder('float')
That line should to be:
y = tf.placeholder('float', shape=[None, n_classes])
If you do that I expect that your error will move to your sess.run call and it will point out that you're passing in the wrong data for your labels.
Also, as a side note, for a binary predictor, you'll get slightly better results if you use a single neuron on the output. Although it works to use 2 neurons for a binary class, it usually performs slightly worse than a single [0,1] output class.