I have implemented a lambda function to resize an image from 28x28x1 to 224x224x3. I need to subtract the VGG mean from all the channels. When i try this, i get an error
TypeError: 'Tensor' object does not support item assignment
def try_reshape_to_vgg(x):
x = K.repeat_elements(x, 3, axis=3)
x = K.resize_images(x, 8, 8, data_format="channels_last")
x[:, :, :, 0] = x[:, :, :, 0] - 103.939
x[:, :, :, 1] = x[:, :, :, 1] - 116.779
x[:, :, :, 2] = x[:, :, :, 2] - 123.68
return x[:, :, :, ::-1]
What's the recommended solution to do element wise subtraction of tensors?
You can use keras.applications.imagenet_utils.preprocess_input on tensors after Keras 2.1.2. It will subtract the VGG mean from x under the default mode 'caffe'.
from keras.applications.imagenet_utils import preprocess_input
def try_reshape_to_vgg(x):
x = K.repeat_elements(x, 3, axis=3)
x = K.resize_images(x, 8, 8, data_format="channels_last")
x = preprocess_input(x)
return x
If you would like to stay in an older version of Keras, maybe you can check how it is implemented in Keras 2.1.2, and extract useful lines into try_reshape_to_vgg.
def _preprocess_symbolic_input(x, data_format, mode):
global _IMAGENET_MEAN
if mode == 'tf':
x /= 127.5
x -= 1.
return x
if data_format == 'channels_first':
# 'RGB'->'BGR'
if K.ndim(x) == 3:
x = x[::-1, ...]
else:
x = x[:, ::-1, ...]
else:
# 'RGB'->'BGR'
x = x[..., ::-1]
if _IMAGENET_MEAN is None:
_IMAGENET_MEAN = K.constant(-np.array([103.939, 116.779, 123.68]))
# Zero-center by mean pixel
if K.dtype(x) != K.dtype(_IMAGENET_MEAN):
x = K.bias_add(x, K.cast(_IMAGENET_MEAN, K.dtype(x)), data_format)
else:
x = K.bias_add(x, _IMAGENET_MEAN, data_format)
return x
Related
I am working on a CNN for color classification problem in pytorch. This is the architecture of my CNN :
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 15)
def forward(self, x):
x = self.pool(F2.relu(self.conv1(x)))
x = self.pool(F2.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F2.relu(self.fc1(x))
x = F2.relu(self.fc2(x))
x = self.fc3(x)
return x
When images are resized to 32*32, the code works fine, but when the images are changed to different size, other than this, let's say 36*36, by transforms.Resize((36, 36)), it throws the following error :
RuntimeError: mat1 and mat2 shapes cannot be multiplied (4x576 and 400x120)
My question is how to adjust the CNN architecture, the layers and all when input image size is changed. Please help.
One way to achieve that is to make sure the spatial dimension is always the same before you flatten the intermediate tensor regardless of the input resolution. For example, by using the nn.AdaptiveAvgPool2d or nn.AdaptiveMaxPool2d. A concrete example will be:
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool1 = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16 * 5 * 5, 5)
self.pool2 = nn.AdaptiveAvgPool2d((1, 1)) # (B, C, H, W) -> (B, C, 1, 1)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 15)
def forward(self, x):
x = self.pool1(F2.relu(self.conv1(x)))
x = self.pool2(F2.relu(self.conv2(x)))
x = torch.flatten(x, 1) # flatten all dimensions except batch
x = F2.relu(self.fc1(x))
x = F2.relu(self.fc2(x))
x = self.fc3(x)
return x
To compensate for the information loss caused by spatial resolution compression (i.e. pooling), we usually need to increase the channel size accordingly.
my code:
My Drawing Function:
def draw_hyper_plane(coef,intercept,y_max,y_min):
points=np.array([[((-coef*y_min - intercept)/coef), y_min],[((-coef*y_max - intercept)/coef), y_max]])
plt.plot(points[:,0], points[:,1])
Actual Output:
Desired Output:
Through my code i am not able to find the proper hyper plane which can correctly classify the point as in desired output plot. Could any body help me in this
One way is to use the decision_function from the classifier and plot some level line (level=0 correspond to your hyperplane). Here is some code.
def plot_2d_separator(classifier, X, fill=False, ax=None, eps=None):
if eps is None:
eps = X.std() / 2.
x_min, x_max = X[:, 0].min() - eps, X[:, 0].max() + eps
y_min, y_max = X[:, 1].min() - eps, X[:, 1].max() + eps
xx = np.linspace(x_min, x_max, 100)
yy = np.linspace(y_min, y_max, 100)
X1, X2 = np.meshgrid(xx, yy)
X_grid = np.c_[X1.ravel(), X2.ravel()]
try:
decision_values = classifier.decision_function(X_grid)
levels = [0]
fill_levels = [decision_values.min(), 0, decision_values.max()]
except AttributeError:
# no decision_function
decision_values = classifier.predict_proba(X_grid)[:, 1]
levels = [.5]
fill_levels = [0, .5, 1]
if ax is None:
ax = plt.gca()
if fill:
ax.contourf(X1, X2, decision_values.reshape(X1.shape),
levels=fill_levels, colors=['tab:blue', 'tab:orange'],
alpha=0.5)
else:
ax.contour(X1, X2, decision_values.reshape(X1.shape), levels=levels,
colors="black")
ax.set_xlim(x_min, x_max)
ax.set_ylim(y_min, y_max)
ax.set_xticks(())
ax.set_yticks(())
This code was developed there
According to the documentation of BasicRNNCell:
__call__(
inputs,
state,
scope=None)
Args:
inputs: 2-D tensor with shape [batch_size x input_size].
It seems that input_size can be different at different runs? As far as I know about RNN, the input_size determines the internal weight matrix W_x with shape (input_size, hidden_state_size), and it should be consistent. What if I run this cell with input_size=3 and input_size=4 alternately?
inputs is a 2-D tensor: [batch_size x input_size].
You are right, the input_size must correspond to the num_units of the RNN cell. But batch_size can vary and only has to correspond to the state, the other parameter of the call.
Try this code:
import tensorflow as tf
from tensorflow.contrib.rnn import BasicRNNCell
dim = 10
x = tf.placeholder(tf.float32, shape=[None, dim])
y = tf.placeholder(tf.float32, shape=[4, dim])
z = tf.placeholder(tf.float32, shape=[None, dim + 1])
print('x, y, z:', x.shape, y.shape, z.shape)
cell = BasicRNNCell(dim)
state1 = cell.zero_state(batch_size=4, dtype=tf.float32)
state2 = cell.zero_state(batch_size=8, dtype=tf.float32)
out1, out2 = cell(x, state1)
print(out1.shape, out2.shape)
out1, out2 = cell(x, state2)
print(out1.shape, out2.shape)
out1, out2 = cell(y, state1)
print(out1.shape, out2.shape)
Here's the output:
x, y, z: (?, 10) (4, 10) (?, 11)
(4, 10) (4, 10)
(8, 10) (8, 10)
(4, 10) (4, 10)
This cell accepts x with both states, y with state1 and doesn't accept z with any state. Both of the following calls result in error:
out1, out2 = cell(y, state2) # ERROR: dimensions mismatch
print(out1.shape, out2.shape)
out1, out2 = cell(z, state1) # ERROR: dimensions mismatch
print(out1.shape, out2.shape)
I understand how and why to use an ImageDataGenerator, but I am interested in casting an eyeball on how the ImageDataGenerator affects my images so I can decide whether I have chosen a good amount of latitude in augmenting my data. I see that I can iterate over the images coming from the generator. I am looking for a way to see whether it's an original image or a modified image, and if the latter what parameters were modified in that particular instance I'm looking at. How/can I see this?
Most of the transformations (except flipping) will always modify the input image. For example, if you've specified rotation_range, from the source code:
theta = np.pi / 180 * np.random.uniform(-self.rotation_range, self.rotation_range)
it's unlikely that the random number will be exactly 0.
There's no convenient way to print out the amount of transformations applied to each image. You have to modify the source code and add some printing functions inside ImageDataGenerator.random_transform().
If you don't want to touch the source code (for example, on a shared machine), you can extend ImageDataGenerator and override random_transform().
import numpy as np
from keras.preprocessing.image import *
class MyImageDataGenerator(ImageDataGenerator):
def random_transform(self, x, seed=None):
# these lines are just copied-and-pasted from the original random_transform()
img_row_axis = self.row_axis - 1
img_col_axis = self.col_axis - 1
img_channel_axis = self.channel_axis - 1
if seed is not None:
np.random.seed(seed)
if self.rotation_range:
theta = np.pi / 180 * np.random.uniform(-self.rotation_range, self.rotation_range)
else:
theta = 0
if self.height_shift_range:
tx = np.random.uniform(-self.height_shift_range, self.height_shift_range) * x.shape[img_row_axis]
else:
tx = 0
if self.width_shift_range:
ty = np.random.uniform(-self.width_shift_range, self.width_shift_range) * x.shape[img_col_axis]
else:
ty = 0
if self.shear_range:
shear = np.random.uniform(-self.shear_range, self.shear_range)
else:
shear = 0
if self.zoom_range[0] == 1 and self.zoom_range[1] == 1:
zx, zy = 1, 1
else:
zx, zy = np.random.uniform(self.zoom_range[0], self.zoom_range[1], 2)
transform_matrix = None
if theta != 0:
rotation_matrix = np.array([[np.cos(theta), -np.sin(theta), 0],
[np.sin(theta), np.cos(theta), 0],
[0, 0, 1]])
transform_matrix = rotation_matrix
if tx != 0 or ty != 0:
shift_matrix = np.array([[1, 0, tx],
[0, 1, ty],
[0, 0, 1]])
transform_matrix = shift_matrix if transform_matrix is None else np.dot(transform_matrix, shift_matrix)
if shear != 0:
shear_matrix = np.array([[1, -np.sin(shear), 0],
[0, np.cos(shear), 0],
[0, 0, 1]])
transform_matrix = shear_matrix if transform_matrix is None else np.dot(transform_matrix, shear_matrix)
if zx != 1 or zy != 1:
zoom_matrix = np.array([[zx, 0, 0],
[0, zy, 0],
[0, 0, 1]])
transform_matrix = zoom_matrix if transform_matrix is None else np.dot(transform_matrix, zoom_matrix)
if transform_matrix is not None:
h, w = x.shape[img_row_axis], x.shape[img_col_axis]
transform_matrix = transform_matrix_offset_center(transform_matrix, h, w)
x = apply_transform(x, transform_matrix, img_channel_axis,
fill_mode=self.fill_mode, cval=self.cval)
if self.channel_shift_range != 0:
x = random_channel_shift(x,
self.channel_shift_range,
img_channel_axis)
if self.horizontal_flip:
if np.random.random() < 0.5:
x = flip_axis(x, img_col_axis)
if self.vertical_flip:
if np.random.random() < 0.5:
x = flip_axis(x, img_row_axis)
# print out the trasformations applied to the image
print('Rotation:', theta / np.pi * 180)
print('Height shift:', tx / x.shape[img_row_axis])
print('Width shift:', ty / x.shape[img_col_axis])
print('Shear:', shear)
print('Zooming:', zx, zy)
return x
I just add 5 prints at the end of the function. Other lines are copied and pasted from the original source code.
Now you can use it with, e.g.,
gen = MyImageDataGenerator(rotation_range=15,
width_shift_range=0.1,
height_shift_range=0.1,
zoom_range=0.5)
flow = gen.flow_from_directory('data', batch_size=1)
img = next(flow)
and see information like this printed on your terminal:
Rotation: -9.185074669096467
Height shift: 0.03791625365979884
Width shift: -0.08398553078553198
Shear: 0
Zooming: 1.40950509832 1.12895574928
I'm trying to train simple neural network
that consists of:
Convolution layer filter (5x5) x 8, stride 2.
Max pooling 25x25 (the image has kinda low amount of details)
Flatting output into (2x2x8) vector
Classifier with logistic regression
Altogether network has < 1000 weights.
File: nn.py
#!/bin/python
import tensorflow as tf
import create_batch
# Prepare data
batch = create_batch.batch
x = tf.reshape(batch[0], [-1,100,100,3])
y_ = batch[1]
# CONVOLUTION NETWORK
# For initialization
def weight_variable(shape):
initial = tf.truncated_normal(shape, stddev=0.3)
return tf.Variable(initial)
def bias_variable(shape):
initial = tf.constant(0.2, shape=shape)
return tf.Variable(initial)
# Convolution with stride 1
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 2, 2, 1], padding='SAME')
def max_pool_25x25(x):
return tf.nn.max_pool(x, ksize=[1, 25, 25, 1],
strides=[1, 25, 25, 1], padding='SAME')
# First layer
W_conv1 = weight_variable([5, 5, 3, 8])
b_conv1 = bias_variable([8])
x_image = tf.reshape(x, [-1,100,100,3])
# First conv1
h_conv1 = tf.nn.relu(conv2d(x_image, W_conv1) + b_conv1)
h_pool1 = max_pool_25x25(h_conv1)
# Dense connection layer
# make data flat
W_fc1 = weight_variable([2 * 2 * 8, 2])
b_fc1 = bias_variable([2])
h_pool1_flat = tf.reshape(h_pool1, [-1, 2*2*8])
y_conv = tf.nn.softmax(tf.matmul(h_pool1_flat, W_fc1) + b_fc1)
#Learning
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y_conv), reduction_indices=[1]))
train_step = tf.train.GradientDescentOptimizer(0.001).minimize(cross_entropy)
correct_prediction = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y_, 1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# Session
sess = tf.Session()
sess.run(tf.initialize_all_variables())
# Start input enqueue threads.
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess=sess, coord=coord)
for i in range(200):
if i%10 == 0:
train_accuracy = accuracy.eval(session=sess)
print("step %d, training accuracy %g"%(i, train_accuracy))
train_step.run(session=sess)
File: create_batch.py
#!/bin/python
import tensorflow as tf
PATH1 = "../dane/trening/NK/"
PATH2 = "../dane/trening/K/"
def create_labeled_image_list():
filenames = [(PATH1 + "nk_%d.png" % i) for i in range(300)]
labels = [[1,0] for i in range(300)]
filenames += [(PATH2 + "kulki_%d.png" % i) for i in range(300)]
labels += [[0,1] for i in range(300)]
return filenames, labels
def read_images_from_disk(input_queue):
label = input_queue[1]
file_contents = tf.read_file(input_queue[0])
example = tf.image.decode_png(file_contents, channels=3)
example.set_shape([100, 100, 3])
example = tf.to_float(example)
print ("READ, label:")
print(label)
return example, label
# Start
image_list, label_list = create_labeled_image_list()
# Create appropriate tensors for naming
images = tf.convert_to_tensor(image_list, dtype=tf.string)
labels = tf.convert_to_tensor(label_list, dtype=tf.float32)
input_queue = tf.train.slice_input_producer([images, labels],
shuffle=True)
image, label = read_images_from_disk(input_queue)
batch = tf.train.batch([image, label], batch_size=600)
I'm feeding 100x100 images i have two classess 300 images each.
Basically randomly initialzied network at step 0 has better accuracy than trained one.
Network stops learning after it reaches 0.5 accuracy (basically coin flip). Images contain blue blooby thing (class 1) or grass (class 2).
I'm traning network using whole imageset at once (600 images), the loss function is cross entropy.
What I'm doing wrong?
OK, I've find a fix there were two errors, now the network is learning.
Images were RGBA despite the fact I declared them as RGB in tf
I did not perform normalization of Images to [-1,1] float32.
In tensorflow it should be done with something like this:
# i use "im" for image
tf.image.convert_image_dtype(im, dtype=float32)
im = tf.sub(im, -0.5)
im = tf.mul(im, 2.0)
To all newbies to ML - prepare your data with caution!
Thanks.