Can anybody tell me how to use mask argument in LSTM cell? - machine-learning

I have a X_train padded sequence of shape (1400, 17640) and separate mask array of shape (1400, 17640) of type boolean what I want to create
layer1 = Input(shape=(X_train.shape[1], 1))
layer2 = Dense(64, name='layer2', mask=[need to provide mask here])(layer1)
output = Dense(10, activation='softmax', name='output')(layer2)
can you tell me how to do that if I already have pre-computed padded sequence and mask Boolean array of same size? there is not much enough document regarding mask argument in LSTM cell in tensorflow documents.

Related

How Save & Reuse mask of panoptic segmentation ( detectron2 ) using opencv drawContours?

In detectron2 the output get the bellow output upon predection.
panoptic_seg, segments_info = predictor(im)["panoptic_seg"]
it can be drawn via draw_panoptic_seg_predictionsfunction, but we want to save the mask deatils & redraw using opencv how can we do that ?
for instance segmentation we can do like this :
import cv2
import numpy as np
# "outputs" is the inference output in the format described here - https://detectron2.readthedocs.io/tutorials/models.html#model-output-format
# Extract the contour of each predicted mask and save it in a list
contours = []
for pred_mask in outputs['instances'].pred_masks:
# pred_mask is of type torch.Tensor, and the values are boolean (True, False)
# Convert it to a 8-bit numpy array, which can then be used to find contours
mask = pred_mask.to("cpu").numpy().astype('uint8')
contour, _ = cv2.findContours(mask, cv2.RETR_LIST, cv2.CHAIN_APPROX_NONE)
contours.append(contour[0]) # contour is a tuple (OpenCV 4.5.2), so take the first element which is the array of contour points
and then draw the mask with this code
# "image" is the original BGR image
image_with_overlaid_predictions = im.copy()
for contour in contours:
cv2.drawContours(image_with_overlaid_predictions, [contour], -1, (0,255,0), 20)
plt.figure()
plt.imshow(image_with_overlaid_predictions)
I want to achive similar with pantopic mask

What should be the input shape for 2D-CNN in this scenario?

The shape of my dataset is :
shape of original dataset : (343889, 80)
shape of - training dataset (257916, 79)
shape of - training Labels (257916,)
shape of - testing dataset (85973, 79)
shape of - testing Labels (85973,)
Was using the input shape for 1D-CNN as (79,1, ) but it isn't working with 2D-CNN

How to read characters in the detected objects rectangle?

I was successful with tensorflow object detection API like an image below.
but now I want to read character inside green box, how to do that?
First you need to crop out the bounding boxes of plate(s) and then you can use tesseract. To get the text. A pseudo code for your problem may look like:
import cv2
import pytesseract
original_img = cv2.imread("/path/to/your/img.png")
for plate in detected_plates:
if plate.confidence > 0.98:
b_box = plate.bounding_rect # bounding box in [x, y, w, h]
img_cropped = original_img[b_box[1]: b_box[1] + b_box[3], b_box[0]: b_box[0] + b_box[2]]
print(pytesseract.image_to_string(img_cropped))
# Load image using OpenCV and
# expand image dimensions to have shape: [1, None, None, 3]
# i.e. a single-column array, where each item in the column has the pixel RGB value
image = cv2.imread(PATH_TO_IMAGE)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image_expanded = np.expand_dims(image_rgb, axis=0)
# Perform the actual detection by running the model with the image as input
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_expanded})
# Draw the results of the detection (aka 'visulaize the results')
vis_util.visualize_boxes_and_labels_on_image_array(
image,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=1,
min_score_thresh=0.60)
# All the results have been drawn on image. Now display the image.
cv2.imshow('Object detector', image)

Intuitive understanding of 1D, 2D, and 3D convolutions in convolutional neural networks [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
Can anyone please clearly explain the difference between 1D, 2D, and 3D convolutions in convolutional neural networks (in deep learning) with the use of examples?
I want to explain with picture from C3D.
In a nutshell, convolutional direction & output shape is important!
↑↑↑↑↑ 1D Convolutions - Basic ↑↑↑↑↑
just 1-direction (time-axis) to calculate conv
input = [W], filter = [k], output = [W]
ex) input = [1,1,1,1,1], filter = [0.25,0.5,0.25], output = [1,1,1,1,1]
output-shape is 1D array
example) graph smoothing
tf.nn.conv1d code Toy Example
import tensorflow as tf
import numpy as np
sess = tf.Session()
ones_1d = np.ones(5)
weight_1d = np.ones(3)
strides_1d = 1
in_1d = tf.constant(ones_1d, dtype=tf.float32)
filter_1d = tf.constant(weight_1d, dtype=tf.float32)
in_width = int(in_1d.shape[0])
filter_width = int(filter_1d.shape[0])
input_1d = tf.reshape(in_1d, [1, in_width, 1])
kernel_1d = tf.reshape(filter_1d, [filter_width, 1, 1])
output_1d = tf.squeeze(tf.nn.conv1d(input_1d, kernel_1d, strides_1d, padding='SAME'))
print sess.run(output_1d)
↑↑↑↑↑ 2D Convolutions - Basic ↑↑↑↑↑
2-direction (x,y) to calculate conv
output-shape is 2D Matrix
input = [W, H], filter = [k,k] output = [W,H]
example) Sobel Egde Fllter
tf.nn.conv2d - Toy Example
ones_2d = np.ones((5,5))
weight_2d = np.ones((3,3))
strides_2d = [1, 1, 1, 1]
in_2d = tf.constant(ones_2d, dtype=tf.float32)
filter_2d = tf.constant(weight_2d, dtype=tf.float32)
in_width = int(in_2d.shape[0])
in_height = int(in_2d.shape[1])
filter_width = int(filter_2d.shape[0])
filter_height = int(filter_2d.shape[1])
input_2d = tf.reshape(in_2d, [1, in_height, in_width, 1])
kernel_2d = tf.reshape(filter_2d, [filter_height, filter_width, 1, 1])
output_2d = tf.squeeze(tf.nn.conv2d(input_2d, kernel_2d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)
↑↑↑↑↑ 3D Convolutions - Basic ↑↑↑↑↑
3-direction (x,y,z) to calcuate conv
output-shape is 3D Volume
input = [W,H,L], filter = [k,k,d] output = [W,H,M]
d < L is important! for making volume output
example) C3D
tf.nn.conv3d - Toy Example
ones_3d = np.ones((5,5,5))
weight_3d = np.ones((3,3,3))
strides_3d = [1, 1, 1, 1, 1]
in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
in_depth = int(in_3d.shape[2])
filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
filter_depth = int(filter_3d.shape[2])
input_3d = tf.reshape(in_3d, [1, in_depth, in_height, in_width, 1])
kernel_3d = tf.reshape(filter_3d, [filter_depth, filter_height, filter_width, 1, 1])
output_3d = tf.squeeze(tf.nn.conv3d(input_3d, kernel_3d, strides=strides_3d, padding='SAME'))
print sess.run(output_3d)
↑↑↑↑↑ 2D Convolutions with 3D input - LeNet, VGG, ..., ↑↑↑↑↑
Eventhough input is 3D ex) 224x224x3, 112x112x32
output-shape is not 3D Volume, but 2D Matrix
because filter depth = L must be matched with input channels = L
2-direction (x,y) to calcuate conv! not 3D
input = [W,H,L], filter = [k,k,L] output = [W,H]
output-shape is 2D Matrix
what if we want to train N filters (N is number of filters)
then output shape is (stacked 2D) 3D = 2D x N matrix.
conv2d - LeNet, VGG, ... for 1 filter
in_channels = 32 # 3 for RGB, 32, 64, 128, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae with in_channels
weight_3d = np.ones((3,3,in_channels))
strides_2d = [1, 1, 1, 1]
in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_3d = tf.constant(weight_3d, dtype=tf.float32)
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
filter_width = int(filter_3d.shape[0])
filter_height = int(filter_3d.shape[1])
input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_3d = tf.reshape(filter_3d, [filter_height, filter_width, in_channels, 1])
output_2d = tf.squeeze(tf.nn.conv2d(input_3d, kernel_3d, strides=strides_2d, padding='SAME'))
print sess.run(output_2d)
conv2d - LeNet, VGG, ... for N filters
in_channels = 32 # 3 for RGB, 32, 64, 128, ...
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((5,5,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]
in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])
input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])
#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)
↑↑↑↑↑ Bonus 1x1 conv in CNN - GoogLeNet, ..., ↑↑↑↑↑
1x1 conv is confusing when you think this as 2D image filter like sobel
for 1x1 conv in CNN, input is 3D shape as above picture.
it calculate depth-wise filtering
input = [W,H,L], filter = [1,1,L] output = [W,H]
output stacked shape is 3D = 2D x N matrix.
tf.nn.conv2d - special case 1x1 conv
in_channels = 32 # 3 for RGB, 32, 64, 128, ...
out_channels = 64 # 128, 256, ...
ones_3d = np.ones((1,1,in_channels)) # input is 3d, in_channels = 32
# filter must have 3d-shpae x number of filters = 4D
weight_4d = np.ones((3,3,in_channels, out_channels))
strides_2d = [1, 1, 1, 1]
in_3d = tf.constant(ones_3d, dtype=tf.float32)
filter_4d = tf.constant(weight_4d, dtype=tf.float32)
in_width = int(in_3d.shape[0])
in_height = int(in_3d.shape[1])
filter_width = int(filter_4d.shape[0])
filter_height = int(filter_4d.shape[1])
input_3d = tf.reshape(in_3d, [1, in_height, in_width, in_channels])
kernel_4d = tf.reshape(filter_4d, [filter_height, filter_width, in_channels, out_channels])
#output stacked shape is 3D = 2D x N matrix
output_3d = tf.nn.conv2d(input_3d, kernel_4d, strides=strides_2d, padding='SAME')
print sess.run(output_3d)
Animation (2D Conv with 3D-inputs)
Original Link : LINK
The author: Martin Görner
Twitter: #martin_gorner
Google +: plus.google.com/+MartinGorne
Bonus 1D Convolutions with 2D input
↑↑↑↑↑ 1D Convolutions with 1D input ↑↑↑↑↑
↑↑↑↑↑ 1D Convolutions with 2D input ↑↑↑↑↑
Eventhough input is 2D ex) 20x14
output-shape is not 2D , but 1D Matrix
because filter height = L must be matched with input height = L
1-direction (x) to calcuate conv! not 2D
input = [W,L], filter = [k,L] output = [W]
output-shape is 1D Matrix
what if we want to train N filters (N is number of filters)
then output shape is (stacked 1D) 2D = 1D x N matrix.
Bonus C3D
in_channels = 32 # 3, 32, 64, 128, ...
out_channels = 64 # 3, 32, 64, 128, ...
ones_4d = np.ones((5,5,5,in_channels))
weight_5d = np.ones((3,3,3,in_channels,out_channels))
strides_3d = [1, 1, 1, 1, 1]
in_4d = tf.constant(ones_4d, dtype=tf.float32)
filter_5d = tf.constant(weight_5d, dtype=tf.float32)
in_width = int(in_4d.shape[0])
in_height = int(in_4d.shape[1])
in_depth = int(in_4d.shape[2])
filter_width = int(filter_5d.shape[0])
filter_height = int(filter_5d.shape[1])
filter_depth = int(filter_5d.shape[2])
input_4d = tf.reshape(in_4d, [1, in_depth, in_height, in_width, in_channels])
kernel_5d = tf.reshape(filter_5d, [filter_depth, filter_height, filter_width, in_channels, out_channels])
output_4d = tf.nn.conv3d(input_4d, kernel_5d, strides=strides_3d, padding='SAME')
print sess.run(output_4d)
sess.close()
Input & Output in Tensorflow
Summary
Following the answer from #runhani I am adding a few more details to make the explanation a bit more clear and will try to explain this a bit more (and of course with exmaples from TF1 and TF2).
One of the main additional bits I'm including are,
Emphasis on applications
Usage of tf.Variable
Clearer explanation of inputs/kernels/outputs 1D/2D/3D convolution
The effects of stride/padding
1D Convolution
Here's how you might do 1D convolution using TF 1 and TF 2.
And to be specific my data has following shapes,
1D vector - [batch size, width, in channels] (e.g. 1, 5, 1)
Kernel - [width, in channels, out channels] (e.g. 5, 1, 4)
Output - [batch size, width, out_channels] (e.g. 1, 5, 4)
TF1 example
import tensorflow as tf
import numpy as np
inp = tf.placeholder(shape=[None, 5, 1], dtype=tf.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME')
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run(out, feed_dict={inp: np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]])}))
TF2 Example
import tensorflow as tf
import numpy as np
inp = np.array([[[0],[1],[2],[3],[4]],[[5],[4],[3],[2],[1]]]).astype(np.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5, 1, 4]), dtype=tf.float32)
out = tf.nn.conv1d(inp, kernel, stride=1, padding='SAME')
print(out)
It's way less work with TF2 as TF2 does not need Session and variable_initializer for example.
What might this look like in real-life?
So let's understand what this is doing using a signal smoothing example. On the left you got the original and on the right you got output of a Convolution 1D which has 3 output channels.
What do multiple channels mean?
Multiple channels are basically multiple feature representations of an input. In this example you have three representations obtained by three different filters. The first channel is the equally-weighted smoothing filter. The second is a filter that weights the middle of the filter more than the boundaries. The final filter does the opposite of the second. So you can see how these different filters bring about different effects.
Deep learning applications of 1D convolution
1D convolution has been successful used for the sentence classification task.
2D Convolution
Off to 2D convolution. If you are a deep learning person, chances that you haven't come across 2D convolution is … well about zero. It is used in CNNs for image classification, object detection, etc. as well as in NLP problems that involve images (e.g. image caption generation).
Let's try an example, I got a convolution kernel with the following filters here,
Edge detection kernel (3x3 window)
Blur kernel (3x3 window)
Sharpen kernel (3x3 window)
And to be specific my data has following shapes,
Image (black and white) - [batch_size, height, width, 1] (e.g. 1, 340, 371, 1)
Kernel (aka filters) - [height, width, in channels, out channels] (e.g. 3, 3, 1, 3)
Output (aka feature maps) - [batch_size, height, width, out_channels] (e.g. 1, 340, 371, 3)
TF1 Example,
import tensorflow as tf
import numpy as np
from PIL import Image
im = np.array(Image.open(<some image>).convert('L'))#/255.0
kernel_init = np.array(
[
[[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
[[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
[[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
])
inp = tf.placeholder(shape=[None, image_height, image_width, 1], dtype=tf.float32)
kernel = tf.Variable(kernel_init, dtype=tf.float32)
out = tf.nn.conv2d(inp, kernel, strides=[1,1,1,1], padding='SAME')
with tf.Session() as sess:
tf.global_variables_initializer().run()
res = sess.run(out, feed_dict={inp: np.expand_dims(np.expand_dims(im,0),-1)})
TF2 Example
import tensorflow as tf
import numpy as np
from PIL import Image
im = np.array(Image.open(<some image>).convert('L'))#/255.0
x = np.expand_dims(np.expand_dims(im,0),-1)
kernel_init = np.array(
[
[[[-1, 1.0/9, 0]],[[-1, 1.0/9, -1]],[[-1, 1.0/9, 0]]],
[[[-1, 1.0/9, -1]],[[8, 1.0/9,5]],[[-1, 1.0/9,-1]]],
[[[-1, 1.0/9,0]],[[-1, 1.0/9,-1]],[[-1, 1.0/9, 0]]]
])
kernel = tf.Variable(kernel_init, dtype=tf.float32)
out = tf.nn.conv2d(x, kernel, strides=[1,1,1,1], padding='SAME')
What might this look like in real life?
Here you can see the output produced by above code. The first image is the original and going clock-wise you have outputs of the 1st filter, 2nd filter and 3 filter.
What do multiple channels mean?
In the context if 2D convolution, it is much easier to understand what these multiple channels mean. Say you are doing face recognition. You can think of (this is a very unrealistic simplification but gets the point across) each filter represents an eye, mouth, nose, etc. So that each feature map would be a binary representation of whether that feature is there in the image you provided. I don't think I need to stress that for a face recognition model those are very valuable features. More information in this article.
This is an illustration of what I'm trying to articulate.
Deep learning applications of 2D convolution
2D convolution is very prevalent in the realm of deep learning.
CNNs (Convolution Neural Networks) use 2D convolution operation for almost all computer vision tasks (e.g. Image classification, object detection, video classification).
3D Convolution
Now it becomes increasingly difficult to illustrate what's going as the number of dimensions increase. But with good understanding of how 1D and 2D convolution works, it's very straight-forward to generalize that understanding to 3D convolution. So here goes.
And to be specific my data has following shapes,
3D data (LIDAR) - [batch size, height, width, depth, in channels] (e.g. 1, 200, 200, 200, 1)
Kernel - [height, width, depth, in channels, out channels] (e.g. 5, 5, 5, 1, 3)
Output - [batch size, width, height, width, depth, out_channels] (e.g. 1, 200, 200, 2000, 3)
TF1 Example
import tensorflow as tf
import numpy as np
tf.reset_default_graph()
inp = tf.placeholder(shape=[None, 200, 200, 200, 1], dtype=tf.float32)
kernel = tf.Variable(tf.initializers.glorot_uniform()([5,5,5,1,3]), dtype=tf.float32)
out = tf.nn.conv3d(inp, kernel, strides=[1,1,1,1,1], padding='SAME')
with tf.Session() as sess:
tf.global_variables_initializer().run()
res = sess.run(out, feed_dict={inp: np.random.normal(size=(1,200,200,200,1))})
TF2 Example
import tensorflow as tf
import numpy as np
x = np.random.normal(size=(1,200,200,200,1))
kernel = tf.Variable(tf.initializers.glorot_uniform()([5,5,5,1,3]), dtype=tf.float32)
out = tf.nn.conv3d(x, kernel, strides=[1,1,1,1,1], padding='SAME')
Deep learning applications of 3D convolution
3D convolution has been used when developing machine learning applications involving LIDAR (Light Detection and Ranging) data which is 3 dimensional in nature.
What... more jargon?: Stride and padding
Alright you're nearly there. So hold on. Let's see what is stride and padding is. They are quite intuitive if you think about them.
If you stride across a corridor, you get there faster in fewer steps. But it also means that you observed lesser surrounding than if you walked across the room. Let's now reinforce our understanding with a pretty picture too! Let's understand these via 2D convolution.
Understanding stride
When you use tf.nn.conv2d for example, you need to set it as a vector of 4 elements. There's no reason to get intimidated by this. It just contain the strides in the following order.
2D Convolution - [batch stride, height stride, width stride, channel stride]. Here, batch stride and channel stride you just set to one (I've been implementing deep learning models for 5 years and never had to set them to anything except one). So that leaves you only with 2 strides to set.
3D Convolution - [batch stride, height stride, width stride, depth stride, channel stride]. Here you worry about height/width/depth strides only.
Understanding padding
Now, you notice that no matter how small your stride is (i.e. 1) there is an unavoidable dimension reduction happening during convolution (e.g. width is 3 after convolving a 4 unit wide image). This is undesirable especially when building deep convolution neural networks. This is where padding comes to the rescue. There are two most commonly used padding types.
SAME and VALID
Below you can see the difference.
Final word: If you are very curious, you might be wondering. We just dropped a bomb on whole automatic dimension reduction and now talking about having different strides. But the best thing about stride is that you control when where and how the dimensions get reduced.
In summary, In 1D CNN, kernel moves in 1 direction. Input and output data of 1D CNN is 2 dimensional. Mostly used on Time-Series data.
In 2D CNN, kernel moves in 2 directions. Input and output data of 2D CNN is 3 dimensional. Mostly used on Image data.
In 3D CNN, kernel moves in 3 directions. Input and output data of 3D CNN is 4 dimensional. Mostly used on 3D Image data (MRI, CT Scans).
You can find more details here: https://medium.com/#xzz201920/conv1d-conv2d-and-conv3d-8a59182c4d6
CNN 1D,2D, or 3D refers to convolution direction, rather than input or filter dimension.
For 1 channel input, CNN2D equals to CNN1D is kernel length = input length. (1 conv direction)

plot convolution in the time domain with octave

This plots but the result of conv is a vector of a new length and so t is usless to include in plot like plot(t, z1) %doesn't work!.
t = [-5:.1:10];
unit = #(t) 1.*(t>=0);
h1 = #(t) (3*t + 2).*exp(-3*t).*unit(t);
z1 = conv(unit(t), h1(t));
plot(z1);
I want a plot of the convolved signal as a function of time.
You need to add the shape argument. Here's the spec:
— Function File: conv (a, b) — Function File: conv (a, b, shape)
Convolve two vectors a and b.
The output convolution is a vector with length equal to length (a) +
length (b) - 1. When a and b are the coefficient vectors of two
polynomials, the convolution represents the coefficient vector of the
product polynomial.
The optional shape argument may be
shape = "full" Return the full convolution. (default) shape = "same"
Return the central part of the convolution with the same size as a.
so convolve like this:
z1 = conv(unit(t), h1(t), "same");
And you'll get the same time units as the original.

Resources