How to use permute on this Input and Target? - image-processing

I am having errors on my semantic segmentation masks with 5 classes + 1 (background).
How do I use permute to avoid this?
Target size (torch.Size([4, 1, 320, 480, 6])) must be the same as input size (torch.Size([4, 6, 320, 480]))

You can combine permute and unsqueeze:
import torch
x = torch.rand((4, 6, 320, 480))
new_x = x.permute((0,2,3,1)).unsqueeze(1)
# new_x.shape = torch.Size([4, 1, 320, 480, 6])

Related

Create embeddings using one hot encoded and numeric values together

I have created video embeddings such that they represent video features. These features include
Content of video [Video speech converted into text to get text embedding of size 3]
Language [5 possible languages, one hot encoded]
Title [Text embedding using Doc2Vec of size 3]
(these numbers are just for example)
Such that my video embedding structure looks like this
-> video embedding = [ [content embedding of size 3], one hot encoded language, [title embedding of size 3] ]
-> video embedding = [ [0.004, 0.0032, 0.0064], 0, 0, 1, 0, 0, [0.03, 0.021, 0.001] ]
On flattening :
-> video embedding = [0.004, 0.0032, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001]
However when I apply distance metric to find similarity among these embeddings, the one hot encoded features overpower
Eg. Cos([0.004, 0.0032, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001], [0.004, 0.0032, 0.0064, 0, 1, 0, 0, 0, 0.03, 0.021, 0.001]) = 0.14 (Only after changing language)
cos([0.05, 0.005, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001], [0.004, 0.0032, 0.0064, 0, 0, 1, 0, 0, 0.03, 0.021, 0.001]) = 0.99 (After changing content vector and keeping language same)
Is there any way to use one hot encoded vectors and numeric vectors together? Or is there a better way to calculate similarity between such vectors?

How to find blurry areas in an image?

I am looking for a method to detect blurry areas in an image. I want to be able to select areas that are blurry. The most interesting is the motion blur.
For example: I made a photo of moving coin and want to detect blurred areas from left and right
Another one:
I tried several methods and the gradient search turned out to be the best. Here the result:
But this method is absolutely not suitable for a non-uniform background. And I can't find blurred areas on photo with car
used code:
import cv2
import numpy as np
import blure as bl
def put_mask(image, mask):
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
return cv2.filter2D(src=gray, ddepth=-1, kernel=mask)
width, height, x, y = 550, 400, 50, 100
img = cv2.imread("car.jpg")
image = img[y:y+height, x:x+width]
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
mask_1 = np.array([ [1, 0, -1],
[2, 0, -2],
[1, 0, -1]])
mask_2 = np.array([ [1, 2, 1],
[0, 0, 0],
[-1, -2, -1]])
masked_1 = cv2.filter2D(gray, ddepth=-1, kernel=mask_1)
masked_2 = cv2.filter2D(gray, ddepth=-1, kernel=mask_2)
masked = cv2.bitwise_or(masked_1, masked_2)
cv2.imshow("edges", image)
cv2.imshow("grad", masked)
cv2.waitKey(0)
cv2.destroyAllWindows()

How to calculate the resolution of an undistorted image?

An undistorted image typically have lower resolution than the original image due to non-uniform distribution of pixels and (usually) the cropping of the black edges. (See below as an example)
So given the camera calibration parameters, e.g. in ROS format
image_width: 1600
image_height: 1200
camera_name: camera1
camera_matrix:
rows: 3
cols: 3
data: [1384.355466887268, 0, 849.4355708515795, 0, 1398.17734010913, 604.5570699746268, 0, 0, 1]
distortion_model: plumb_bob
distortion_coefficients:
rows: 1
cols: 5
data: [0.0425049914802741, -0.1347528158561486, -0.0002287009852930437, 0.00641133892300999, 0]
rectification_matrix:
rows: 3
cols: 3
data: [1, 0, 0, 0, 1, 0, 0, 0, 1]
projection_matrix:
rows: 3
cols: 4
data: [1379.868041992188, 0, 860.3000889574832, 0, 0, 1405.926879882812, 604.3997819099422, 0, 0, 0, 1, 0]
How would one calculate the final resolution of the undistorted rectified image?
From Fruchtzwerg's comment, the following will give the effective ROI of the undistorted image
import cv2
import numpy as np
mtx = np.array([
[1384.355466887268, 0, 849.4355708515795],
[ 0, 1398.17734010913, 604.5570699746268],
[0, 0, 1]])
dist = np.array([0.0425049914802741, -0.1347528158561486, -0.0002287009852930437, 0.00641133892300999, 0])
cv2.getOptimalNewCameraMatrix(mtx, dist, (1600, 1200), 1)

Counterpart of Python Imaging Library in Manipulating Videos

I am currently working on a code on how to paint on facial landmarks in real-time using opencv and face_recognition module. I saw a source code on the internet about painting over an image using PIL and face_recognition, I was wondering what module is the counterpart of PIL in terms of manipulating videos? I want to find the landmarks of the face that is showing on the webcam and paint on those landmarks (example: eyebrows, lips etc)
This is my current code:
from PIL import Image, ImageDraw
import face_recognition
import cv2
video_capture = cv2.VideoCapture(0)
face_locations = []
process_this_frame = True
face_landmarks_list = []
while True:
ret, frame = video_capture.read()
small_frame = cv2.resize(frame, (0, 0), fx = 0.25, fy = 0.25)
rgb_small_frame = small_frame[:, :, ::-1]
if process_this_frame:
face_locations = face_recognition.face_locations(rgb_small_frame)
face_landmarks_list = face_recognition.face_landmarks(rgb_small_frame)
for face_landmarks in face_landmarks_list:
pil_image = Image.fromarray(rgb_small_frame)
d = ImageDraw.Draw(pil_image, 'RGBA')
# Make the eyebrows into a nightmare
d.polygon(face_landmarks['left_eyebrow'], fill=(68, 54, 39, 128))
d.polygon(face_landmarks['right_eyebrow'], fill=(68, 54, 39, 128))
d.line(face_landmarks['left_eyebrow'], fill=(68, 54, 39, 150), width=5)
d.line(face_landmarks['right_eyebrow'], fill=(68, 54, 39, 150), width=5)
# Gloss the lips
d.polygon(face_landmarks['top_lip'], fill=(150, 0, 0, 128))
d.polygon(face_landmarks['bottom_lip'], fill=(150, 0, 0, 128))
d.line(face_landmarks['top_lip'], fill=(150, 0, 0, 64), width=8)
d.line(face_landmarks['bottom_lip'], fill=(150, 0, 0, 64), width=8)
# Sparkle the eyes
d.polygon(face_landmarks['left_eye'], fill=(255, 255, 255, 30))
d.polygon(face_landmarks['right_eye'], fill=(255, 255, 255, 30))
# Apply some eyeliner
d.line(face_landmarks['left_eye'] + [face_landmarks['left_eye'][0]], fill=(0, 0, 0, 110), width=6)
d.line(face_landmarks['right_eye'] + [face_landmarks['right_eye'][0]], fill=(0, 0, 0, 110), width=6)
cv2.imshow('Video', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
video_capture.release()
cv2.destroyAllWindows()

How to set HSV color range in OpenCV?

I have a phone and it's HSV histogram like blow,and I want to track this phone's movement.Based on it's histogram,I set image range like this:
greenLower = (300, 0, 50)
greenUpper = (50, 128,250 )
cv2.inRange(hsv, greenLower, greenUpper)
But nothing got detected out when waving the phone,and I am pretty sure it is because color range is wrong,would you tell me how to get color rang setting right?Especially,when HUE values are between [300~50],should I set it to (50~300) or (300~50) due to HUE is a cirle.
Phone
HSV histogram:
You have wrongly set the upper and lower bounds, they must be:
greenLower = (50, 0, 50) # Previously (300, 0, 50)
greenUpper = (300, 128, 250) # Previously (50, 128,250)
Also make sure that hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV) as OpenCV follows the BGR convention.
EDIT:
To segment colors in multiple ranges 0~50 and 300~359, you can perform cv2.inRange() twice for two ranges as:
greenLower1 = (0, 0, 20)
greenUpper1 = (50, 128, 100)
greenLower2 = (300, 0, 20)
greenUpper2 = (359, 128, 100)
mask1 = cv2.inRange(img_hsv, greenLower1, greenUpper1)
mask2 = cv2.inRange(img_hsv, greenLower2, greenUpper2)
mask = cv2.max(mask1, mask2)

Resources