Incompatible shapes error while training my UNET model in Tensorflow - image-processing

I have written an UNET model in Tensorflow, to run on some CT scan images, which have been already preprocessed and turned into .npz files.The model runs successfully, but I get an error,Shapes (2, 512, 512, 2, 17) and (2, 512, 512, 2) are incompatible. My model has 17 features(Tuberculosis, Ground glass etc), and Features are the same as labels I assume(correctly?). I am attaching the colab link of the model, with shapes outputted wherever necessary. The error is after the last block of code, and the model runs perfectly until I execute the last block. Where exactly is my model failing? And how do I match the shapes for the model to train successfully?Here is the colab link.
P.S:I am really sorry to use a colab link, but the code relevent is too big to source here.

Related

Tensorflow: tf.trainable_variables() does not show my mode weights

I am trying to extract my model weights to be able to run my first pre-trained model. However, I am unable to extract my weights since executing tf.trainable_variables() give me the following output:
[<tf.Variable 'VGGNet/B1C1/kernel:0' shape=(3, 1, 3, 32) dtype=float32_ref>, <tf.Variable 'VGGNet/B1C1/bias:0' shape=(32,) dtype=float32_ref>, <tf.Variable 'VGGNet/B1C2/kernel:0' shape=(1, 3, 32, 32)....
It shows the shape, but not the numpy array that I am expecting. What am I missing?
You should not be extracting weights for using your pre-trained model; instead use the model.save_weights() and model.load_weights() utility methods provided by the API. Here's a link to the document you can use to learn more about it Save and load models.
Coming to your question - why you are not seeing the weights: it is because tf.trainable_variables() is supposed to give you variables and not their values (aka weights).

How do I match samples with their predictions when doing inference with PyTorch's DistributedSampler?

I have trained a torch model for NLP tasks and would like to perform some inference using a multi GPU machine (in this case with two GPUs).
Inside the processing code, I use this
dataset = TensorDataset(encoded_dict['input_ids'], encoded_dict['attention_mask'])
sampler = DistributedSampler(
dataset, num_replicas=args.nodes * args.gpus, rank=args.node_rank * args.gpus + gpu_number, shuffle=False
)
dataloader = DataLoader(dataset, batch_size=batch_size, sampler=sampler)
For those familiar with NLP, encoded_dict is the output from the tokenizer.batch_encode_plus function where the tokenizer is an instance of transformers.BertTokenizer.
The issue I’m having is that when I call the code through the torch.multiprocessing.spawn function, each GPU is doing predictions (i.e. inference) on a subset of the full dataset, and saving the predictions separately; for example, if I have a dataset with 1000 samples to predict, each GPU is predicting 500 of them. As a result, I have no way of knowing which samples out of the 1000 were predicted by which GPU, as their order is not preserved, therefore the model predictions are meaningless as I cannot trace each of them back to their input sample.
I have tried to save the dataloader instance (as a pickle) together with the predictions and then extracting the input_ids by using dataloader.dataset.tensors, however this requires a tokeniser decoding step which I rather avoid, as the tokenizer will have slightly changed the text (for example double whitespaces would be removed, words with dashes will have been split and so on).
What is the cleanest way to save the input text samples together with their predictions when doing inference in distributed mode, or alternatively to keep track of which prediction refers to which sample?
As I understand it, basically your dataset returns for an index idx [data,label] during training and [data] during inference. The issue with this is that the idx is not preserved by the dataloader object, so there is no way to obtain the idx values for the minibatch after the fact.
One way to handle this issue is to define a very simple custom dataset object that also returns [data,id] instead of only data during inference. Probably the easiest way to do this is to make the dataset return a dictionary object with keys id and data. The dictionary return type is convenient because Pytorch collates (converts data structures to batches) this type automatically, otherwise you'd have to define a custom collate_fn and pass it to the dataloader object, which is itself not very hard but is an extra step.
In any case, here's I would define a new dataset object as follows which should be almost a one-to-one substitute for your current dataset (I believe):
def TensorDictDataset(torch.data.Dataset):
def __init__(self,ids,attention_mask):
self.ids = ids
self.mask = attention_mask
def __len__(self):
return len(self.ids)
def __getitem(self,idx):
datum = {
"mask": self.mask[idx],
"id":ids[idx]
}
return datum
The only change you'll then have to make is that rather than returning mask your dataset will now return dict{"mask":mask,"id":id} so you'll have to parse that appropriately.
thanks for your answer. I have done further debugging and found another solution and wanted to post it.
Your solution is quite elegant (there was one minor misunderstanding, in that the predictions contain only the predicted labels and not the data contrary to what you understood, but this doesn't affect your answer anyway). Mask is NLP is also something else, and instead of having the mask tokens together with predictions I would like to have the untokenized text string. This is not so easy to achieve because the splitting of the data into different GPUs happens AFTER the tokenisation, however I believe that with a slight adaptation to your answer it could work.
However, I’ve done some further debugging and I’ve noticed that the data are not actually randomly split across GPUs as I thought. If I set shuffle=False in the DistributedSampler then this happens:
in the case of two GPUs, GPU 0 and GPU 1, all the samples with even index (starting from 0) will be passed to GPU 0, and all those with odd index will be passed to GPU 1.
So for example, if you have 10 samples, whose indices are [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], then samples 0, 2, 4, 6, 8 will go to GPU 0 and samples 1, 3, 5, 7, 9 will to go GPU 1. Therefore this allows me to map the predictions back to the original text string samples by just using this ordering. Not sure if this is the best solution, as keeping the original text string next to its prediction would be ideal, but at least it works.
N.B. Special case: As the two GPUs must be passed the SAME number of inputs, if the number of inputs is an odd number, for example we have 9 samples with indices [0, 1, 2, 3, 4, 5, 6, 7, 8], then GPU 0 will be passed samples 0, 2, 4, 6, 8 and GPU 1 will be passed samples 1, 3, 5, 7, 0 (in this exact order). In other words, the first sample with index 0 is repeated at the very end of the dataset to make sure each GPU has the same number of samples, in which case we can then write some codes which drops the last prediction from GPU 1 as it is redundant.

Why does training a tensorflow model with a tf.data pipeline yield radically different results than directly feeding with EagerTensors?

I am trying to set up a pipeline to train a model. To get started, I am using 'training wheels'.
I preprocess all of my data into 5 EagerTensors -- 3 for features, 2 for targets.
For the sake of argument, lets call the feature tensors "in_a, in_b, in_c" and the target tensors "tgt_1, tgt_2"
The shape of the input tensors are as follows:
in_a.shape (67473, 132, 5)
in_b.shape (67473, 132)
in_c.shape (67473, 132)
Target tensors are:
tgt_1.shape (67473, 132)
tgt_2.shape (67473, 132)
If I feed these tensors into my model using the .fit method in the following way:
training_model.fit(x=[in_a, in_b, in_c],y=[tgt_1, tgt_2],batch_size = 32, shuffle = True, epochs = 20)
I get wonderful results 100% of the time that I run the fit (input data is identical in all cases)
HOWEVER, I have more data than I can fit in memory, so I am tyring to figure out the tf.data.Dataset flow, and this is where I have problems.
I take the exact same tensors and create a zipped dataset in the following way:
feature_ds = tf.data.Dataset.from_tensor_slices((a_in, b_in, c_in))
target_ds = tf.data.Dataset.from_tensor_slices((tgt_1, tgt_2))
full_dataset = tf.data.Dataset.zip((feature_ds,target_ds)).shuffle(buffer_size=320).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
This yields the following element_spec:
((TensorSpec(shape=(None, 132, 5), dtype=tf.float64, name=None), TensorSpec(shape=(None, 132), dtype=tf.float32, name=None), TensorSpec(shape=(None, 132), dtype=tf.float32, name=None)), (TensorSpec(shape=(None, 132), dtype=tf.float32, name=None), TensorSpec(shape=(None, 132), dtype=tf.float32, name=None)))
Now, when I feed the dataset into the exact same model, I get radically variant results - every time I train the model.
training_model.fit(full_dataset, epochs = 20)
One fit of 20 epochs leaves good results; another run, medocre; another, awful.
What could I be doing wrong? Any ideas how to troubleshoot this? I mean, the data source doesn't change between the two ways of feeding the model, just the method used to get it there.
Many thanks in advance!
Reefmo
Sorted it out... turns out that model.fit(shuffle=True) will shuffle the entire dataset at each epoch.
The way I zipped the full_dataset above used a .shuffle(buffer_size=320)
Problem here was that the dataset was some 67k records long -- and the way the shuffle works on datasets is sorta funky -- just the buffer gets shuffled, and then as data is read out, it backfills the buffer (So I think). AND the default behavior is to NOT shuffle at the end of every epoch.
By changing the line to
full_dataset = tf.data.Dataset.zip((input_ds,target_ds)).shuffle(buffer_size=67000),reshuffle_each_iteration=True).batch(32).prefetch(tf.data.experimental.AUTOTUNE)
Fixed my issue.

TensorFlow 1.2.1 and InceptionV3 to classify an image

I'm trying to create an example using the Keras built in the latest version of TensorFlow from Google. This example should be able to classify a classic image of an elephant. The code looks like this:
# Import a few libraries for use later
from PIL import Image as IMG
from tensorflow.contrib.keras.python.keras.preprocessing import image
from tensorflow.contrib.keras.python.keras.applications.inception_v3 import InceptionV3
from tensorflow.contrib.keras.python.keras.applications.inception_v3 import preprocess_input, decode_predictions
# Get a copy of the Inception model
print('Loading Inception V3...\n')
model = InceptionV3(weights='imagenet', include_top=True)
print ('Inception V3 loaded\n')
# Read the elephant JPG
elephant_img = IMG.open('elephant.jpg')
# Convert the elephant to an array
elephant = image.img_to_array(elephant_img)
elephant = preprocess_input(elephant)
elephant_preds = model.predict(elephant)
print ('Predictions: ', decode_predictions(elephant_preds))
Unfortunately I'm getting an error when trying to evaluate the model with model.predict:
ValueError: Error when checking : expected input_1 to have 4 dimensions, but got array with shape (299, 299, 3)
This code is taken from and based on the excellent example coremltools-keras-inception and will be expanded more when it is figured out.
The reason why this error occured is that model always expects the batch of examples - not a single example. This diverge from a common understanding of models as mathematical functions of their inputs. The reasons why model expects batches are:
Models are computationaly designed to work faster on batches in order to speed up training.
There are algorithms which takes into account the batch nature of input (e.g. Batch Normalization or GAN training tricks).
So four dimensions comes from a first dimension which is a sample / batch dimension and then - the next 3 dimensions are image dims.
Actually I found the answer. Even though the documentation states that if the top layer is included the shape of the input vector is still set to take a batch of images. Thus we need to add this before the code line for the prediction:
elephant = numpy.expand_dims(elephant, axis=0)
Then the tensor is in the right shape and everything works correctly. I am still uncertain why the documentation states that the input vector should be (3x299x299) or (299x299x3) when it clearly wants 4 dimensions.
Be careful!

Name of input and output tensors when loading Keras model to TensorFlow

I'm trying to use Keras' model in "pure" TensorFlow (I want to use it in Android app). I've successfully exported Keras model to protobuf and imported it to Tensorflow. However running tensorflow model requires providing input and output tensors' names and I don't know how to find them. My model looks like this:
seq = Sequential()
seq.add(Convolution2D(32, 3, 3, input_shape=(3, 15, 15), name="Conv1"))
....
seq.add(Activation('softmax', name="Act4"))
seq.compile()
When I'm printing tensors in TensorFlow I can find:
Tensor("Conv1_W/initial_value:0", shape=(32, 3, 3, 3), dtype=float32)
Tensor("Conv1_W:0", dtype=float32_ref)
Tensor("Conv1_W/Assign:0", shape=(32, 3, 3, 3), dtype=float32_ref)
Tensor("Conv1_W/read:0", dtype=float32)
Tensor("Act4_sample_weights:0", dtype=float32)
Tensor("Act4_target:0", dtype=float32)
Hovewer, there are no tensors that have shape (3,15,15).
I've seen here that I can add "my_input_tensor" as input, hovewer I don't know which type is it - I've tried TensorFlow's and Keras' placeholders and they gave me this error:
/XXXXXXXXX/lib/python2.7/site-packages/keras/engine/topology.pyc in __init__(self, input, output, name)
1599 # check that x is an input tensor
1600 layer, node_index, tensor_index = x._keras_history
-> 1601 if len(layer.inbound_nodes) > 1 or (layer.inbound_nodes and layer.inbound_nodes[0].inbound_layers):
1602 cls_name = self.__class__.__name__
1603 warnings.warn(cls_name + ' inputs must come from '
AttributeError: 'NoneType' object has no attribute 'inbound_nodes'
As of TensorFlow 2.0 (unfortunately they seem to change this often) you can export the model to the SavedModel format -in python- using
model.save('MODEL-FOLDER')
and then inspect the model using the saved_model_cli tool (found inside python folder <yourenv>/bin/saved_model_cli -in anaconda at least)
saved_model_cli show --dir /path/to/model/MODEL-FOLDER/ --tag_set serve --signature_def serving_default
the output will be something like
The given SavedModel SignatureDef contains the following input(s):
inputs['graph_input'] tensor_info:
dtype: DT_DOUBLE
shape: (-1, 28, 28)
name: serving_default_graph_input:0
The given SavedModel SignatureDef contains the following output(s):
outputs['graph_output'] tensor_info:
dtype: DT_FLOAT
shape: (-1, 10)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
By inspecting the output, you can see the name of the input and output tensors in this case to be, respectively: serving_default_graph_input and StatefulPartitionedCall
This is how you find the tensor names.
The right way to do this, though, is to define a graph path and its output and input tensors on the model using SignatureDefs. So you's load those SignaturesDefs instead of having to deal with the tensor name's directly.
This is a nice reference that explains this better than the official docs, imo:
https://sthalles.github.io/serving_tensorflow_models/
Call a model.summary() in Keras to see all the layers.
An input tensor will often be called input_1, input_2, etc. See in the summary the correct name.
When you use input_shape=(3,15,15) in Keras, you're actually using tensors that have shape (None, 3, 15, 15). Where None will be replaced by the batch size in training or prediction.
Often, for these unknonw dimensions, you use -1, such as in (-1, 3, 15, 15). But I cannot assure you that it will work like this. It works perfectly for reshaping tensors, but for creating I've never tested.
To get the input and output tensors of your Keras models, do the following:
input_tensor = seq.inputs[0]
output_tensor = seq.outputs[0]
print("Inputs: "+str(input_tensor))
print("Outputs: "+str(output_tensor))
The above assumes that there is only 1 input tensor and 1 output tensor. If you have more, then you would have to use the appropriate index to get those tensors.
Note that there is a difference between layer output shapes and tensor output shapes. The two are usually the same, but not always.
You can try calling summary() on the loaded model object as suggested in one of the answers. But if you couldn't find the input and output names in the model summary, try calling input_names and output_names on the model object as below :
from tensorflow.keras.models import load_model
model = load_model("./model/00001")
print(model.input_names)
print(model.output_names)
Tried out on TensorFlow version : 2.3.1

Resources