I trained resnet-18 model in pytorch. And it works well in pytorch.
But, when I converts it to onnx and predicts in cv2, model predicts only 1~2 label(it should predict 0~17 labels).
this is my model export code
model.eval()
x = torch.randn(1, 3, 512, 384, requires_grad=True)
# export model
torch.onnx.export(model, x, "model.onnx", export_params=True, opset_version=10, do_constant_folding=True, input_names = ['input'], output_names = ['output'])
And this is my code for inference in cv2
self.transform = albumentations.Compose([
albumentations.Resize(512, 384, cv2.INTER_LINEAR),
albumentations.GaussianBlur(3, sigma_limit=(0.1, 2)),
albumentations.Normalize(mean=(0.5), std=(0.2)),
albumentations.ToFloat(max_value=255)
])
...
#image crop code: works fine in pytorch
image = frame[ymin:ymax, xmin:xmax] #type(frame)==numpy.array, RGB form
augmented = self.transform(image=image)
image = augmented["image"]
...
#inference code: does not work well
net=cv2.dnn.readNet("Model.onnx")
blob = cv2.dnn.blobFromImage(image, swapRB=False, crop=False)
net.setInput(blob)
label = np.array(net.forward())
text = 'Label: '+str(np.argmax(label[0]))
All transform settings works well in pytorch.
What can be the problem in this code?
The problem with your code probably has to do with preprocessing the images differently: self.transform rescales the image, but when you are reading blob, you are not doing that. To verify this, you can read the same image and check if the image and blob are equal (e.g. using torch.allclose), when the actual (random) augmentations are disabled.
Related
I am working on a dataset of 300K images doing multi class image classification. So far i took a small dataset of around 7k images, but the code either returns memory error or my notebook just dies. The code below converts all images to a numpy array at once, which results in trouble with my memory when the last row of code gets executed. train.csv contains image-filenames and one hot encoded labels.
The code is like this:
data = pd.read_csv('train.csv')
img_width = 400
img_height = 400
img_vectors = []
for i in range(data.shape[0]):
path = 'Images/' + data['Id'][
img = image.load_img(path, target_size=(img_width, img_height, 3))
img = image.img_to_array(img)
img = img/255.0
img_vectors.append(img)
img_vectors = np.array(img_vectors)
Error Message:
MemoryError Traceback (most recent call last)
<ipython-input-13-dd2302ae54e1> in <module>
----> 1 img_vectors = np.array(img_vectors)
MemoryError: Unable to allocate array with shape (7344, 400, 400, 3) and data type float32
I guess I need a batch of smaller arrays for all images to handle memory issue, to avoid having one array with all imagedata at the same time.
On an earlier project i did image-classification without multi-label with around 225k images. Anyway this code doesnt convert all image-data to one giant array. It rather puts the imagedata into smaller batches:
#image preparation
if K.image_data_format() is "channels_first":
input_shape = (3, img_width, img_height)
else:
input_shape = (img_width, img_height, 3)
train_datagen = ImageDataGenerator(rescale=1./255, horizontal_flip=True)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(train_data_dir, target_size=(img_width, img_height), batch_size=batch_size, class_mode='categorical')
validation_generator = test_datagen.flow_from_directory(validation_data_dir, target_size=(img_width, img_height), batch_size=batch_size, class_mode='categorical')
model = Sequential()
model.add(Conv2D(32, (3,3), input_shape=input_shape))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
...
model.add(Dense(17))
model.add(BatchNormalization(axis=1, momentum=0.6))
model.add(Activation('softmax'))
model.summary()
model.compile(loss='categorical_crossentropy', optimizer='rmsprop', metrics=['accuracy'])
model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
class_weight = class_weight
)
So what i actually need is an approach of how I can handle big datasets of images for multilabel image classification without getting in trouble with memory.
Ideal would be to work with a csv-file containing image-filename and one-hot-encoded labels in combination with array batches for learning.
Any help or guesses here would be greatly appreciated.
The easiest way to solve the problem you are facing is to write a costume data generator, here is a tutorial that shows how to do this. The idea is that instead of using flow_from_directory, you create generate a costume dataloader, that reads each image from its source path and gives to y the correspongind labels. Practiclly I think that your data is stored on a .csv file, where each row contain the path to an image, and the labels present in the image. So your datagen will have a function getittem(self, index) that will read the image from the path in raw number index and return along with the target that is obtained by reading the labels in this raw and one hot encode them, then sum them.
I am trying to train a Keras CNN model on plant images. I needed to preprocess those images before training because they contain extra information that I don't want the model to learn.
Solution: Color-based segmentation with openCV, I kept just the green pixels
def segmented(image):
foto = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
hsv_foto = cv2.cvtColor(foto, cv2.COLOR_RGB2HSV)
colormin=(25,50,50)
colormax=(86,255,255)
mask = cv2.inRange(hsv_foto, colormin , colormax)
result = cv2.bitwise_and(foto, foto, mask=mask)
return result
Orignal and transformed image
Problem:
The function works fine while visualizing the segmented images but I am struggling to pass it to a Keras model to train just the transformed images and not the original ones from directory while training.
My solution: What I am trying now is to include my segmented() function to the keras ImageDataGenerator:
train_datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
validation_split=0.2)
train_generator = train_datagen.flow_from_directory(
segmented('../input/v2-plant-seedlings-dataset/nonsegmentedv2/'),
target_size=(64,64),
batch_size=32,
class_mode='categorical',
subset='training')
or in training
training=model.fit_generator(
segmented(train_generator),
steps_per_epoch=100,
epochs=20,
validation_data = segmented(validation_generator),
validation_steps = 30,
callbacks=[earlystopper1, checkpointer1]
)
But I get this error that is probably related to image reading and opening
TypeError Traceback (most recent call last)
<ipython-input-47-3fc9e5fcdc32> in <module>
1 training=model.fit_generator(
----> 2 segmented1(train_generator),
3 steps_per_epoch=100,
4 epochs=20,
5 validation_data = validation_generator,
<ipython-input-46-24182f9d357f> in segmented1(np_image)
1 def segmented1(np_image):
2
----> 3 foto = cv2.cvtColor(np_image, cv2.COLOR_BGR2RGB)
4 hsv_foto = cv2.cvtColor(foto, cv2.COLOR_RGB2HSV)
5
TypeError: Expected Ptr<cv::UMat> for argument '%s'
train_datagen = ImageDataGenerator(
preprocessing_function = segmented,
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
validation_split=0.2)
I didn't try your function, but here is the explanation about the input and output of preprocessing_function which is mentioned in this linked: https (https://keras.io/preprocessing/image/):
preprocessing_function: function that will be applied on each input.
The function will run after the image is resized and augmented. The
function should take one argument: one image (Numpy tensor with rank
3), and should output a Numpy tensor with the same shape.
You should probably modify the segmented function a little bit.
I'm trying to use the ResNet-50 model from the ONNX model zoo and load and train it in CNTK for an image classification task. The first thing that confuses me is, that the batch axis (not sure what's the official name for it, dynamic axis?) is set to 1 in this model:
Why is that? Couldn't it simply be [3x224x224]? In this model for example, the input looks like this:
To load the model and use my own Dense layer, I use the following code:
def create_model(num_classes, input_features, freeze=False):
base_model = load_model("restnet-50.onnx", format=ModelFormat.ONNX)
feature_node = find_by_name(base_model, "gpu_0/data_0")
last_node = find_by_uid(base_model, "Reshape2959")
substitutions = {
feature_node : placeholder(name='new_input')
}
cloned_layers = last_node.clone(CloneMethod.clone, substitutions)
cloned_out = cloned_layers(input_features)
z = Dense(num_classes, activation=softmax, name="prediction") (cloned_out)
return z
For training I use (shortened):
# datasets = list of classes
feature = input_variable(shape=(1, 3, 224, 224))
label = input_variable(shape=(1,3))
model = create_model(len(datasets), feature)
loss = cross_entropy_with_softmax(model, label)
# some definitions for learner, epochs, ProgressPrinters missing
for epoch in range(epochs):
loss.train((X_current,y_current), parameter_learners=[learner], callbacks=[progress_printer])
X_current is a single image and y_current the corresponding class label both encoded as numpy arrays with the followings shapes
X_current.shape
(1, 3, 224, 224)
y_current.shape
(1, 3)
When I try to train the model, I get
"ValueError: ToBatchAxis7504 ToBatchAxisNode operation can only operate on tensor without minibatch data (no layout)"
What's wrong here?
I'm trying to build OneClass classifier for image recognition. I found this article, but because I have no full source code I don't exactly understand what am i doing.
X_train, X_test, y_train, y_test = train_test_split(x, y, random_state=42)
# X_train (2250, 200, 200, 3)
resnet_model = ResNet50(input_shape=(200, 200, 3), weights='imagenet', include_top=False)
features_array = resnet_model.predict(X_train)
# features_array (2250, 7, 7, 2048)
pca = PCA(svd_solver='randomized', n_components=450, whiten=True, random_state=42)
svc = SVC(kernel='rbf', class_weight='balanced')
model = make_pipeline(pca, svc)
param_grid = {'svc__C': [1, 5, 10, 50], 'svc__gamma': [0.0001, 0.0005, 0.001, 0.005]}
grid = GridSearchCV(model, param_grid)
grid.fit(X_train, y_train)
I have 2250 images (food and not food) 200x200px size, I send this data to predict method of ResNet50 model. Result is (2250, 7, 7, 2048) tensor, any one know what this dimensionality does it mean?
When I try to run grid.fit method i get an error:
ValueError: Found array with dim 4. Estimator expected <= 2.
These are the findings I could make.
You are getting the output tensor above the global average pooling layer. (See resnet_model.summary() to know about how input dimension changes to output dimension)
For a simple fix, add an Average pooling 2d Layer on top of resnet_model.
(So that output shape becomes (2250,1,1, 2048))
resnet_model = ResNet50(input_shape=(200, 200, 3), weights='imagenet', include_top=False)
resnet_op = AveragePooling2D((7, 7), name='avg_pool_app')(resnet_model.output)
resnet_model = Model(resnet_model.input, resnet_op, name="ResNet")
This generally is present in the source code of ResNet50 itself. Basically we are appending an AveragePooling2D layer to the resnet50 model. The last line combines the layer (2nd line) and the base line model into a model object.
Now the output dimension (feature_array) will be (2250, 1, 1, 2048) (because of added average pooling layer).
To avoid the ValueError you ought to reshape this feature_array to (2250, 2048)
feature_array = np.reshape(feature_array, (-1, 2048))
In the last line of the program in the question,
grid.fit(X_train, y_train)
you have fit with X_train (which are images in this case). The correct variable here is features_array (This is considered to be summary of the image). Entering this line will rectify the error,
grid.fit(features_array, y_train)
For more finetuning in this fashion by extracting feature vectors do look here (training with neural nets instead of using PCA and SVM).
Hope this helps!!
I'm fitting my keras model on a sample of images and their corresponding binary masks for object detection. Basically, I'm followig the example at the end of this page:
from keras.preprocessing.image import ImageDataGenerator
# we create two instances with the same arguments
data_gen_args = dict(
rotation_range=4.,
width_shift_range=0.05,
height_shift_range=0.05,
shear_range=0.05,
zoom_range=0.05,
horizontal_flip=True, fill_mode='nearest')
image_datagen = ImageDataGenerator(**data_gen_args)
mask_datagen = ImageDataGenerator(**data_gen_args)
seed = 2019
Now create generators for images and masks:
target_size = (180, 320)
small_target_size = (11,20)
batch_size = 8
image_generator_trn = image_datagen.flow_from_directory(
path+'train',
class_mode=None,
target_size = target_size,
batch_size = batch_size,
shuffle= False,
seed=seed)
mask_generator_trn = mask_datagen.flow_from_directory(
path+'mask/train',
class_mode=None,
target_size = small_target_size,
batch_size = batch_size,
shuffle= False,
seed=seed)
Outpu:
Found 3327 images belonging to 2 classes.
Found 3327 images belonging to 2 classes.
Finally we create a generator to be used in model.fit_generator:
train_generator = zip(image_generator_trn, mask_generator_trn)
My problem is with the last line (zipping); i either get memory exception or it doesn't finish execution. I suspect it's trying to zip 2 infinite loops, and tried zipping lazy-ly in model.fit_generator but same issue.
What can i do differently?
The problem lies in that zip tries to exhause both of the generators when they are designed to produce outputs infinitely. This is the reason behind this behaviour. In order to overcome this issue use itertools.izip function. Moreover - please notice that if you don't set the same seed for both generators - different augmentations would be applied to your x and y images. You need to either turn off random augmentation or set the same seed.