I have the simulation data which contains the velocity values (x, y and z components), as well as density and energy values for each voxel. Is it possible to obtain the optical flow from the velocities, which will be the ground truth?
I tried the following:
data = np.fromfile('Velocity/ns_1000_v.dat', dtype=np.float32)
data = np.reshape(data,(128, 128, 128, 3)) # 3D volumes
slice_num = 5 # let's pick up a slice
vx = data[:,:,slice_num,0]
vy = data[:,:,slice_num,1]
hsv = np.zeros((128,128,3), dtype=np.uint8)
hsv[..., 1] = 255
mag, ang = cv2.cartToPolar(vx, vy)
hsv[..., 0] = ang * 180 / np.pi / 2
hsv[..., 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX)
bgr = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
This doesn't show the (2D) flow, which I expected to see.
I already calculated the optical flow from the density values (in an unsupervised setup) and now wanted to compare with the ground truth. An example of the computed flow looks like this:
Computed unsupervised flow
Thank you!
I have input image and grid passed in torch.nn.functional.grid_sample(). Now if I have a random pixel location (x, y) from the input image, how can I find out its location in the output of grid_sample().
To be precise I am looking for the delta of each pixel in terms of coordinates.
Would this be sufficient for finding new location of pixel:
ix = ((ix + 1) / 2) * (IW-1);
iy = ((iy + 1) / 2) * (IH-1);
as mentioned in https://github.com/pytorch/pytorch/blob/f064c5aa33483061a48994608d890b968ae53fb5/aten/src/THNN/generic/SpatialGridSamplerBilinear.c
How did you compute the grid? It must be based on some transform. Often, the affine_grid function is used. And this function takes the transformation matrix as input.
Given this transformation matrix (and its inverse), you can go in both directions: from input image pixel location to output image pixel location, and the other way round.
Here a sample code showing how to compute the transforms both for forward and backward direction. In the last line you see how to map a pixel location in both directions.
import torch
import torch.nn.functional as F
# given a transform mapping from output to input, create the sample grid
input_tensor = torch.zeros([1, 1, 2, 2]) # batch x channels x height x width
transform = torch.tensor([[[0.5, 0, 0], [0, 1, 3]]]).float()
grid = F.affine_grid(transform, input_tensor.size(), align_corners=True)
# show the grid
print('y', grid[0, ..., 0])
print('x', grid[0, ..., 1])
# compute both transformation matrices (forward and backward) with shape 3x3
transform_full = torch.zeros([1, 3, 3])
transform_full[0, 2, 2] = 1
transform_full[0, :2, :3] = transform
transform_inv_full = torch.inverse(transform_full)
# map pixel location x=2, y=3 in both directions (forward and backward)
print(transform_full#torch.tensor([[2, 3, 1]]).float().T)
print(transform_inv_full#torch.tensor([[2, 3, 1]]).float().T)
Currently I'm working on a project, where I need to measure the width of car fuse wire. In order to achieve that I need to detect and localize the fuse on the image. fuse_image
My plan is to find bounding rectangle region with the fuse and then search for a wire contours in fixed position of that region.fuse_contours
I have already tried ORB, BRISK feature based template matching, but the results were not acceptable. Maybe anyone can suggest some possible methods to solve this task?
We can start the problem by applying Canny operation to see the features of the image. Result is:
The aim is to calculate the width. Therefore we only need the left and right outer length of the image. We don't need inner lines. To remove the inner features we can smooth the image.
How do we accurately calculate the width? What part of the features can we take as an reference? If we consider the base? The base features are:
How do we find the base feature coordinates?
Blue point is the one with the highest y coordinate value
Red point is the one with the highest x coordinate value
For all detected line coordinates, we need to find the highest y coordinate value with the corresponding x coordinate value. We need to find the highest x coordinate value with the corresponding y value.
For detecting line coordinates we can use fast line detector. Result will be:
We can calculate the euclidian-distance, which will be: 146.49 pixel
The idea is based on the finding the base and then calculating the euclidean-distance.
The orientation of the fuse can be random.
First, we need to get the fuse part of the image.
Second, we need to get the canny features (or any other filtering method)
At this point we need to find the left (blue-dot) and right (red-dot) part of the fuse:
If we connect them:
We will have an approximate length of the fuse.
So How do we find the left and right parts of the fuse?
Finding left part:
1. From the current x1, x2 tuples
2. If min(x1, x2) < x_min
3. x_min = min(x1, x2)
Finding right part:
1. From the current x1, x2 tuples
2. If max(x1, x2) > x_max
3. x_max = max(x1, x2)
This is my idea for approaching the problem. You can modify for better results.
# Load libraries
import cv2
import numpy as np
# Load the image
img = cv2.imread("E8XlZ.jpg")
# Get the image dimension
(h, w) = img.shape[:2]
# Convert to hsv
hsv = cv2.cvtColor(img, cv2.COLOR_BGR2HSV)
# Get the binary-mask
msk = cv2.inRange(hsv, np.array([0, 24, 161]), np.array([77, 255, 217]))
# Display the mask
cv2.imshow("msk", msk)
# Smooth the image
gauss = cv2.GaussianBlur(msk, (21, 21), 0)
# Canny features
cny = cv2.Canny(gauss, 50, 200)
# Display canny features
cv2.imshow("cny", cny)
# Initialize line-detector
lns = cv2.ximgproc.createFastLineDetector().detect(cny)
# Initialize temporary variables
x_min, x_max, y_min, y_max = w, 0, 0, 0
# Detect the lines
for line in lns:
# Get current coordinates
x1 = int(line[0][0])
y1 = int(line[0][1])
x2 = int(line[0][2])
y2 = int(line[0][3])
# Get maximum coordinates
if max(x1, x2) > x_max:
x_max = max(x1, x2)
y_max = y1 if x_max == x1 else y2
if min(x1, x2) < x_min:
x_min = min(x1, x2)
y_min = y1 if x_min == x1 else y2
# Draw the points
cv2.circle(img, (x_min, int((y_min + y_max)/2)), 3, (255, 0, 0), 5)
cv2.circle(img, (x_max, int((y_min + y_max)/2)), 3, (0, 0, 255), 5)
# Write coordinates to the console
print("Coordinates: ({}, {})->({}, {})".format(x_min, int((y_min + y_max)/2), x_max, int((y_min + y_max)/2)))
# Draw the minimum and maximum coordinates
cv2.line(img, (x_min, int((y_min + y_max)/2)), (x_max, int((y_min + y_max)/2)), (0, 255, 0), 5)
# Calculate the euclidean distance
pt1 = np.array((x_min, int((y_min + y_max)/2)))
pt2 = np.array((x_max, int((y_min + y_max)/2)))
dist = np.linalg.norm(pt1 - pt2)
print("Result: %.2f pixel" % dist)
# Display the result
cv2.imshow("img", img)
I have some code, largely taken from various sources linked at the bottom of this post, written in Python, that takes an image of shape [height, width] and some bounding boxes in the [x_min, y_min, x_max, y_max] format, both numpy arrays, and rotates an image and its bounding boxes counterclockwise. Since after rotation the bounding box becomes more of a "diamond shape", i.e. not axis aligned, then I perform some calculations to make it axis aligned. The purpose of this code is to perform data augmentation in training an object detection neural network through the use of rotated data (where flipping horizontally or vertically is common). It seems flips of other angles are common for image classification, without bounding boxes, but when there is boxes, the resources for how to flip the boxes as well as the images is relatively sparse/niche.
It seems when I input an angle of 45 degrees, that I get some less than "tight" bounding boxes, as in the four corners are not a very good annotation, whereas the original one was close to perfect.
The image shown below is the first image in the MS COCO 2014 object detection dataset (training image), and its first bounding box annotation. My code is as follows:
import math
import cv2
import numpy as np
# angle assumed to be in degrees
# bbs a list of bounding boxes in x_min, y_min, x_max, y_max format
def rotateImageAndBoundingBoxes(im, bbs, angle):
h, w = im.shape[0], im.shape[1]
(cX, cY) = (w//2, h//2) # original image center
M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0) # 2 by 3 rotation matrix
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
# compute the dimensions of the rotated image
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))
# adjust the rotation matrix to take into account translation of the new centre
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY
rotated_im = cv2.warpAffine(im, M, (nW, nH))
rotated_bbs = []
for bb in bbs:
# get the four rotated corners of the bounding box
vec1 = np.matmul(M, np.array([bb[0], bb[1], 1], dtype=np.float64)) # top left corner transformed
vec2 = np.matmul(M, np.array([bb[2], bb[1], 1], dtype=np.float64)) # top right corner transformed
vec3 = np.matmul(M, np.array([bb[0], bb[3], 1], dtype=np.float64)) # bottom left corner transformed
vec4 = np.matmul(M, np.array([bb[2], bb[3], 1], dtype=np.float64)) # bottom right corner transformed
x_vals = [vec1[0], vec2[0], vec3[0], vec4[0]]
y_vals = [vec1[1], vec2[1], vec3[1], vec4[1]]
x_min = math.ceil(np.min(x_vals))
x_max = math.floor(np.max(x_vals))
y_min = math.ceil(np.min(y_vals))
y_max = math.floor(np.max(y_vals))
bb = [x_min, y_min, x_max, y_max]
// my function to resize image and bbs to the original image size
rotated_im, rotated_bbs = resizeImageAndBoxes(rotated_im, w, h, rotated_bbs)
return rotated_im, rotated_bbs
The good bounding box looks like:
The not-so-good bounding box looks like :
I am trying to determine if this is an error of my code, or this is expected behavior? It seems like this problem is less apparent at integer multiples of pi/2 radians (90 degrees), but I would like to achieve tight bounding boxes at any angle of rotation. Any insights at all appreciated.
[Open CV2 documentation] https://docs.opencv.org/3.4/da/d54/group__imgproc__transform.html#gafbbc470ce83812914a70abfb604f4326
[Data Augmentation Discussion]
[Mathematics of rotation around an arbitrary point in 2 dimension]
It seems for the most part this is expected behavior as per the comments. I do have a kind of hacky solution to this problem, where you can write a function like
# assuming box coords = [x_min, y_min, x_max, y_max]
def cropBoxByPercentage(box_coords, image_width, image_height, x_percentage=0.05, y_percentage=0.05):
box_xmin = box_coords[0]
box_ymin = box_coords[1]
box_xmax = box_coords[2]
box_ymax = box_coords[3]
box_width = box_xmax-box_xmin+1
box_height = box_ymax-box_ymin+1
dx = int(x_percentage * box_width)
dy = int(y_percentage * box_height)
box_xmin = max(0, box_xmin-dx)
box_xmax = min(image_width-1, box_xmax+dx)
box_ymin = max(0, box_ymax - dy)
box_ymax = min(image_height - 1, box_ymax + dy)
return np.array([box_xmin, box_xmax, box_ymin, box_ymax])
Where computing the x_percentage and y_percentage can be computed using a fixed value, or could be computed using some heuristic.
OpenCV DNN module does not predict correct detections for YOLOv3. Whereas the Darknet detector detects correctly.
System information (version)
OpenCV => 4.2.1 and 4.4.x
Operating System / Platform => Ubuntu 18.04 64Bit
I tested results with compiled OpenCV from source code and I tried with pre-built opencv-python also but OpenCV DNN detects wrong objects.
Whereas Darknet detector detects correctly.
Correct detection with darknet detector:
Wrong detection with OpenCV DNN module:
YOLOv3 network and model weights are from https://github.com/AlexeyAB/darknet
modelWeights: yolov3.weights
modelConfiguration: yolov3.cfg
ClassesFile: coco.names
Detailed description
Please see the output images at the link appended below. (correct detection with darknet detector)
compared with the wrong detection (with OpenCV DNN)
Output images available in this Google Drive link.
The above link includes test-images also for steps to test
# The following code is partial to demonstrate steps
net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
layerNames = net.getLayerNames()
layerNames = [layerNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# construct a blob from the input frame and then perform a forward pass of the YOLO object detector,
# giving us our bounding boxes and associated probabilities
blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
swapRB=True, crop=False)
layerOutputs = net.forward(layerNames)
# initialize our lists of detected bounding boxes, confidences,
# and class IDs, respectively
boxes = []
confidences = []
classIDs = []
# loop over each of the layer outputs
for output in layerOutputs:
# loop over each of the detections
for detection in output:
# extract the class ID and confidence (i.e., probability)
# of the current object detection
scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]
# filter out weak predictions by ensuring the detected
# probability is greater than the minimum probability
if confidence > args["confidence"]:
# scale the bounding box coordinates back relative to
# the size of the image, keeping in mind that YOLO
# actually returns the center (x, y)-coordinates of
# the bounding box followed by the boxes' width and
# height
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype("int")
# use the center (x, y)-coordinates to derive the top
# and and left corner of the bounding box
x = int(centerX - (width / 2))
y = int(centerY - (height / 2))
# update our list of bounding box coordinates,
# confidences, and class IDs
boxes.append([x, y, int(width), int(height)])
# apply non-maxima suppression to suppress weak, overlapping
# bounding boxes
idxs = cv2.dnn.NMSBoxes(boxes, confidences, args["confidence"], args["threshold"])
dets = []
if len(idxs) > 0:
# loop over the indexes we are keeping
for i in idxs.flatten():
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])
dets.append([x, y, x+w, y+h, confidences[i]])
if len(boxes) > 0:
i = int(0)
for box in boxes:
# extract the bounding box coordinates
(x, y) = (int(box[0]), int(box[1]))
(w, h) = (int(box[2]), int(box[3]))
# draw a bounding box rectangle and label on the image
# color = [int(c) for c in COLORS[classIDs[i]]]
# cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
color = [int(c) for c in COLORS[indexIDs[i] % len(COLORS)]]
cv2.rectangle(frame, (x, y), (w, h), color, 2)
cv2.putText(frame, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.75, color, 2)# 1.0 0.5, color, 2)
i += 1
cv2.imwrite("detection-output.jpg", frame)
i think your detection is correct, since all of your labels is car, the problem is the text you have in this line:
cv2.putText(frame, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.75, color, 2)
you should put the class name in the text but i cant find where the text is defined. your code should be like this :
cv2.putText(frame, classes[class_ids[index]], (x + 5, y + 20), cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, colors,2)
but in my experience , darknet has better detection than opencv dnn.
I would like to generate a polynomial 'fit' to the cluster of colored pixels in the image here
(The point being that I would like to measure how much that cluster approximates an horizontal line).
I thought of using grabit or something similar and then treating this as a cloud of points in a graph. But is there a quicker function to do so directly on the image file?
Here is a Python implementation. Basically we find all (xi, yi) coordinates of the colored regions, then set up a regularized least squares system where the we want to find the vector of weights, (w0, ..., wd) such that yi = w0 + w1 xi + w2 xi^2 + ... + wd xi^d "as close as possible" in the least squares sense.
import numpy as np
import matplotlib.pyplot as plt
def rgb2gray(rgb):
return np.dot(rgb[...,:3], [0.299, 0.587, 0.114])
def feature(x, order=3):
"""Generate polynomial feature of the form
[1, x, x^2, ..., x^order] where x is the column of x-coordinates
and 1 is the column of ones for the intercept.
x = x.reshape(-1, 1)
return np.power(x, np.arange(order+1).reshape(1, -1))
I_orig = plt.imread("2Md7v.jpg")
# Convert to grayscale
I = rgb2gray(I_orig)
# Mask out region
mask = I > 20
# Get coordinates of pixels corresponding to marked region
X = np.argwhere(mask)
# Use the value as weights later
weights = I[mask] / float(I.max())
# Convert to diagonal matrix
W = np.diag(weights)
# Column indices
x = X[:, 1].reshape(-1, 1)
# Row indices to predict. Note origin is at top left corner
y = X[:, 0]
We want to find vector w that minimizes || Aw - y ||^2
so that we can use it to predict y = w . x
Here are 2 versions. One is a vanilla least squares with l2 regularization and the other is weighted least squares with l2 regularization.
# Ridge regression, i.e., least squares with l2 regularization.
# Should probably use a more numerically stable implementation,
# e.g., that in Scikit-Learn
# alpha is regularization parameter. Larger alpha => less flexible curve
alpha = 0.01
# Construct data matrix, A
order = 3
A = feature(x, order)
# w = inv (A^T A + alpha * I) A^T y
w_unweighted = np.linalg.pinv( A.T.dot(A) + alpha * np.eye(A.shape[1])).dot(A.T).dot(y)
# w = inv (A^T W A + alpha * I) A^T W y
w_weighted = np.linalg.pinv( A.T.dot(W).dot(A) + alpha * \
The result
# Generate test points
n_samples = 50
x_test = np.linspace(0, I_orig.shape[1], n_samples)
X_test = feature(x_test, order)
# Predict y coordinates at test points
y_test_unweighted = X_test.dot(w_unweighted)
y_test_weighted = X_test.dot(w_weighted)
# Display
fig, ax = plt.subplots(1, 1, figsize=(10, 5))
ax.plot(x_test, y_test_unweighted, color="green", marker='o', label="Unweighted")
ax.plot(x_test, y_test_weighted, color="blue", marker='x', label="Weighted")
For simple straight line fit, set the argument order of feature to 1. You can then use the gradient of the line to get a sense of how close it is to a horizontal line (e.g., by checking the angle of its slope).
It is also possible to set this to any degree of polynomial you want. I find that degree 3 looks pretty good. In this case, the 6 times the absolute value of the coefficient corresponding to x^3 (w_unweighted[3] or w_weighted[3]) is one measure of the curvature of the line.
See A measure for the curvature of a quadratic polynomial in Matlab for additional details.