How can I properly transition between yolov3 and yolov4 for opencv? - opencv

In my old project, I used a yolov3 model and trained it, and the used cv2.readNetFromDarknet(...), cv2.dnn.blobFromImage(...), and net.forward(...) in order to run my inference and it worked fine. I trained my yolov4 model and assumed that I only had to change the configuration and weights file in my project in order for the inference to work but instead I get no detected bounding boxes. Is there supposed to be a change other than the weights and configs? Thank you.
edit:
net = cv2.dnn.readNetFromDarknet(PATHS["model_config"], PATHS["model_weights"])
layer_names = net.getLayerNames()
output_layer_names = [ layer_names[index[0] - 1] for index in net.getUnconnectedOutLayers() ]
#during inference
def method(frame):
blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (MAGIC_NUMBER_1,
MAGIC_NUMBER_1), swapRB=True, crop=False)
net.setInput(blob)
layerOutputs = net.forward(output_layer_names)
boxes = []
confidences = []
classIDs = []
for output in layerOutputs:
for detection in output:
scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]
if confidence > .5: # NEVER PASSES, CONFIDENCE IS ALWAYS 0 WHEN PRINTED OUT
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype("int")
x = int(centerX - (width / 2))
y = int(centerY - (height / 2))
boxes.append([x, y, int(width), int(height)])
confidences.append(float(confidence))
classIDs.append(classID)
return boxes, confidences, classIDs

Related

How to extract area of interest in the image while the boundary is not obvious

Are there ways to just extract the area of interest (the square light part in the red circle in the original image)? That means I need to get the coordinates of the edge and then masking the image outside the boundaries. I don't know how to do that. Could anyone help? Thanks!
#define horizontal and Vertical sobel kernels
Gx = np.array([[-1, 0, 1],[-2, 0, 2],[-1, 0, 1]])
Gy = np.array([[-1, -2, -1],[0, 0, 0],[1, 2, 1]])
#define kernal convolution function
# with image X and filter F
def convolve(X, F):
# height and width of the image
X_height = X.shape[0]
X_width = X.shape[3]
# height and width of the filter
F_height = F.shape[0]
F_width = F.shape[1]
H = (F_height - 1) // 2
W = (F_width - 1) // 2
#output numpy matrix with height and width
out = np.zeros((X_height, X_width))
#iterate over all the pixel of image X
for i in np.arange(H, X_height-H):
for j in np.arange(W, X_width-W):
sum = 0
#iterate over the filter
for k in np.arange(-H, H+1):
for l in np.arange(-W, W+1):
#get the corresponding value from image and filter
a = X[i+k, j+l]
w = F[H+k, W+l]
sum += (w * a)
out[i,j] = sum
#return convolution
return out
#normalizing the vectors
sob_x = convolve(image, Gx) / 8.0
sob_y = convolve(image, Gy) / 8.0
#calculate the gradient magnitude of vectors
sob_out = np.sqrt(np.power(sob_x, 2) + np.power(sob_y, 2))
# mapping values from 0 to 255
sob_out = (sob_out / np.max(sob_out)) * 255
plt.imshow(sob_out, cmap = 'gray', interpolation = 'bicubic')
plt.show()

Optical flow from velocity components

I have the simulation data which contains the velocity values (x, y and z components), as well as density and energy values for each voxel. Is it possible to obtain the optical flow from the velocities, which will be the ground truth?
I tried the following:
data = np.fromfile('Velocity/ns_1000_v.dat', dtype=np.float32)
data = np.reshape(data,(128, 128, 128, 3)) # 3D volumes
slice_num = 5 # let's pick up a slice
vx = data[:,:,slice_num,0]
vy = data[:,:,slice_num,1]
hsv = np.zeros((128,128,3), dtype=np.uint8)
hsv[..., 1] = 255
mag, ang = cv2.cartToPolar(vx, vy)
hsv[..., 0] = ang * 180 / np.pi / 2
hsv[..., 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX)
bgr = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)
This doesn't show the (2D) flow, which I expected to see.
I already calculated the optical flow from the density values (in an unsupervised setup) and now wanted to compare with the ground truth. An example of the computed flow looks like this:
Computed unsupervised flow
Thank you!

How to test OpenCV DNN module accuracy? It does not predict correct detections for YOLOv3. Whereas Darknet detector detects correctly

OpenCV DNN module does not predict correct detections for YOLOv3. Whereas the Darknet detector detects correctly.
System information (version)
OpenCV => 4.2.1 and 4.4.x
Operating System / Platform => Ubuntu 18.04 64Bit
I tested results with compiled OpenCV from source code and I tried with pre-built opencv-python also but OpenCV DNN detects wrong objects.
Whereas Darknet detector detects correctly.
Correct detection with darknet detector:
Wrong detection with OpenCV DNN module:
YOLOv3 network and model weights are from https://github.com/AlexeyAB/darknet
modelWeights: yolov3.weights
modelConfiguration: yolov3.cfg
ClassesFile: coco.names
Detailed description
Please see the output images at the link appended below. (correct detection with darknet detector)
compared with the wrong detection (with OpenCV DNN)
Output images available in this Google Drive link.
The above link includes test-images also for steps to test
# The following code is partial to demonstrate steps
net = cv.dnn.readNetFromDarknet(modelConfiguration, modelWeights)
layerNames = net.getLayerNames()
layerNames = [layerNames[i[0] - 1] for i in net.getUnconnectedOutLayers()]
# construct a blob from the input frame and then perform a forward pass of the YOLO object detector,
# giving us our bounding boxes and associated probabilities
blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
swapRB=True, crop=False)
net.setInput(blob)
layerOutputs = net.forward(layerNames)
# initialize our lists of detected bounding boxes, confidences,
# and class IDs, respectively
boxes = []
confidences = []
classIDs = []
# loop over each of the layer outputs
for output in layerOutputs:
# loop over each of the detections
for detection in output:
# extract the class ID and confidence (i.e., probability)
# of the current object detection
scores = detection[5:]
classID = np.argmax(scores)
confidence = scores[classID]
# filter out weak predictions by ensuring the detected
# probability is greater than the minimum probability
if confidence > args["confidence"]:
# scale the bounding box coordinates back relative to
# the size of the image, keeping in mind that YOLO
# actually returns the center (x, y)-coordinates of
# the bounding box followed by the boxes' width and
# height
box = detection[0:4] * np.array([W, H, W, H])
(centerX, centerY, width, height) = box.astype("int")
# use the center (x, y)-coordinates to derive the top
# and and left corner of the bounding box
x = int(centerX - (width / 2))
y = int(centerY - (height / 2))
# update our list of bounding box coordinates,
# confidences, and class IDs
boxes.append([x, y, int(width), int(height)])
confidences.append(float(confidence))
classIDs.append(classID)
# apply non-maxima suppression to suppress weak, overlapping
# bounding boxes
idxs = cv2.dnn.NMSBoxes(boxes, confidences, args["confidence"], args["threshold"])
dets = []
if len(idxs) > 0:
# loop over the indexes we are keeping
for i in idxs.flatten():
(x, y) = (boxes[i][0], boxes[i][1])
(w, h) = (boxes[i][2], boxes[i][3])
dets.append([x, y, x+w, y+h, confidences[i]])
if len(boxes) > 0:
i = int(0)
for box in boxes:
# extract the bounding box coordinates
(x, y) = (int(box[0]), int(box[1]))
(w, h) = (int(box[2]), int(box[3]))
# draw a bounding box rectangle and label on the image
# color = [int(c) for c in COLORS[classIDs[i]]]
# cv2.rectangle(frame, (x, y), (x + w, y + h), color, 2)
color = [int(c) for c in COLORS[indexIDs[i] % len(COLORS)]]
cv2.rectangle(frame, (x, y), (w, h), color, 2)
cv2.putText(frame, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.75, color, 2)# 1.0 0.5, color, 2)
i += 1
cv2.imwrite("detection-output.jpg", frame)
i think your detection is correct, since all of your labels is car, the problem is the text you have in this line:
cv2.putText(frame, text, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.75, color, 2)
you should put the class name in the text but i cant find where the text is defined. your code should be like this :
cv2.putText(frame, classes[class_ids[index]], (x + 5, y + 20), cv2.FONT_HERSHEY_COMPLEX_SMALL, 1, colors,2)
but in my experience , darknet has better detection than opencv dnn.

playing cards detection with custom Yolo with OpenCv. How to know the inputs and outputs from the custom Yolo .cfg file

I want to detect playing cards and found .cfg and .weights for it. Classes has 52cards names. Following code is giving index out of range error. I couldn't understand the outputs of Yolo and how to get the detected labels. I am new to this, have been trying to understand. Can someone please help!
import cv2
import numpy as np
# Load Yolo
net = cv2.dnn.readNet("yolocards_608.weights", "yolocards.cfg")
classes = []
with open("cards.names", "r") as f:
classes = [line.strip() for line in f.readlines()]
layer_names = net.getLayerNames()
output_layers = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
colors = np.random.uniform(0, 255, size=(len(classes), 3))
# Loading image
img = cv2.imread("playing_cards_image.jpg")
img = cv2.resize(img, None, fx=0.4, fy=0.4)
height, width, channels = img.shape
# Detecting objects
blob = cv2.dnn.blobFromImage(img, 0.00392, (416, 416), (0, 0, 0), True, crop=False)
net.setInput(blob)
outs = net.forward(output_layers)
# Showing informations on the screen
class_ids = []
confidences = []
boxes = []
for out in outs:
print(out.shape)
for detection in out:
scores = detection[:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > 0.5:
# Object detected
center_x = int(detection[0] * width)
center_y = int(detection[1] * height)
w = int(detection[2] * width)
h = int(detection[3] * height)
# Rectangle coordinates
x = int(center_x - w / 2)
y = int(center_y - h / 2)
boxes.append([x, y, w, h])
confidences.append(float(confidence))
class_ids.append(class_id)
indexes = cv2.dnn.NMSBoxes(boxes, confidences, 0.5, 0.4)
font = cv2.FONT_HERSHEY_PLAIN
for j in range(len(boxes)):
if i in indexes:
x, y, w, h = boxes[i]
print(class_ids[i])
label = str(classes[class_ids[i]])
print(label)
color = colors[i]
cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
cv2.putText(img, label, (x, y + 30), font, 3, color, 3)
error:
0
Ah
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-46-adaf82305ab8> in <module>
6 label = str(classes[class_ids[i]])
7 print(label)
----> 8 color = colors[i]
9 cv2.rectangle(img, (x, y), (x + w, y + h), color, 2)
10 cv2.putText(img, label, (x, y + 30), font, 3, color, 3)
IndexError: index 52 is out of bounds for axis 0 with size 52

'numpy.ndarray' object has no attribute 'split'

I have read that AttributeError: 'numpy.ndarray' object has no attribute 'split' and Why is pytesseract causing AttributeError: 'NoneType' object has no attribute 'bands'? but found no soultion to my answer.
I am following this article https://www.pyimagesearch.com/2018/09/17/opencv-ocr-and-text-recognition-with-tesseract/#comment-481113. This shows how can we detect and extract text on images using tesseacact. I have insalled everything on my system but when i run the code i receive the following error :
Traceback (most recent call last):
File "text_recognition.py", line 157, in <module>
text = pytesseract.image_to_string(roi, config=config)
File "C:\Users\prince.bhatia\AppData\Local\Programs\Python\Python36\lib\site-p
ackages\pytesseract\pytesseract.py", line 104, in image_to_string
if len(image.split()) == 4:
AttributeError: 'numpy.ndarray' object has no attribute 'split'
below is my complete code:
# USAGE
# python text_recognition.py --east frozen_east_text_detection.pb --image images/example_01.jpg
# python text_recognition.py --east frozen_east_text_detection.pb --image images/example_04.jpg --padding 0.05
# import the necessary packages
from imutils.object_detection import non_max_suppression
import numpy as np
import pytesseract
import argparse
import cv2
def decode_predictions(scores, geometry):
# grab the number of rows and columns from the scores volume, then
# initialize our set of bounding box rectangles and corresponding
# confidence scores
(numRows, numCols) = scores.shape[2:4]
rects = []
confidences = []
# loop over the number of rows
for y in range(0, numRows):
# extract the scores (probabilities), followed by the
# geometrical data used to derive potential bounding box
# coordinates that surround text
scoresData = scores[0, 0, y]
xData0 = geometry[0, 0, y]
xData1 = geometry[0, 1, y]
xData2 = geometry[0, 2, y]
xData3 = geometry[0, 3, y]
anglesData = geometry[0, 4, y]
# loop over the number of columns
for x in range(0, numCols):
# if our score does not have sufficient probability,
# ignore it
if scoresData[x] < args["min_confidence"]:
continue
# compute the offset factor as our resulting feature
# maps will be 4x smaller than the input image
(offsetX, offsetY) = (x * 4.0, y * 4.0)
# extract the rotation angle for the prediction and
# then compute the sin and cosine
angle = anglesData[x]
cos = np.cos(angle)
sin = np.sin(angle)
# use the geometry volume to derive the width and height
# of the bounding box
h = xData0[x] + xData2[x]
w = xData1[x] + xData3[x]
# compute both the starting and ending (x, y)-coordinates
# for the text prediction bounding box
endX = int(offsetX + (cos * xData1[x]) + (sin * xData2[x]))
endY = int(offsetY - (sin * xData1[x]) + (cos * xData2[x]))
startX = int(endX - w)
startY = int(endY - h)
# add the bounding box coordinates and probability score
# to our respective lists
rects.append((startX, startY, endX, endY))
confidences.append(scoresData[x])
# return a tuple of the bounding boxes and associated confidences
return (rects, confidences)
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", type=str,
help="path to input image")
ap.add_argument("-east", "--east", type=str,
help="path to input EAST text detector")
ap.add_argument("-c", "--min-confidence", type=float, default=0.5,
help="minimum probability required to inspect a region")
ap.add_argument("-w", "--width", type=int, default=320,
help="nearest multiple of 32 for resized width")
ap.add_argument("-e", "--height", type=int, default=320,
help="nearest multiple of 32 for resized height")
ap.add_argument("-p", "--padding", type=float, default=0.0,
help="amount of padding to add to each border of ROI")
args = vars(ap.parse_args())
# load the input image and grab the image dimensions
image = cv2.imread(args["image"])
orig = image.copy()
(origH, origW) = image.shape[:2]
# set the new width and height and then determine the ratio in change
# for both the width and height
(newW, newH) = (args["width"], args["height"])
rW = origW / float(newW)
rH = origH / float(newH)
# resize the image and grab the new image dimensions
image = cv2.resize(image, (newW, newH))
(H, W) = image.shape[:2]
# define the two output layer names for the EAST detector model that
# we are interested -- the first is the output probabilities and the
# second can be used to derive the bounding box coordinates of text
layerNames = [
"feature_fusion/Conv_7/Sigmoid",
"feature_fusion/concat_3"]
# load the pre-trained EAST text detector
print("[INFO] loading EAST text detector...")
net = cv2.dnn.readNet(args["east"])
# construct a blob from the image and then perform a forward pass of
# the model to obtain the two output layer sets
blob = cv2.dnn.blobFromImage(image, 1.0, (W, H),
(123.68, 116.78, 103.94), swapRB=True, crop=False)
net.setInput(blob)
(scores, geometry) = net.forward(layerNames)
# decode the predictions, then apply non-maxima suppression to
# suppress weak, overlapping bounding boxes
(rects, confidences) = decode_predictions(scores, geometry)
boxes = non_max_suppression(np.array(rects), probs=confidences)
# initialize the list of results
results = []
# loop over the bounding boxes
for (startX, startY, endX, endY) in boxes:
# scale the bounding box coordinates based on the respective
# ratios
startX = int(startX * rW)
startY = int(startY * rH)
endX = int(endX * rW)
endY = int(endY * rH)
# in order to obtain a better OCR of the text we can potentially
# apply a bit of padding surrounding the bounding box -- here we
# are computing the deltas in both the x and y directions
dX = int((endX - startX) * args["padding"])
dY = int((endY - startY) * args["padding"])
# apply padding to each side of the bounding box, respectively
startX = max(0, startX - dX)
startY = max(0, startY - dY)
endX = min(origW, endX + (dX * 2))
endY = min(origH, endY + (dY * 2))
# extract the actual padded ROI
roi = orig[startY:endY, startX:endX]
print(roi)
# in order to apply Tesseract v4 to OCR text we must supply
# (1) a language, (2) an OEM flag of 4, indicating that the we
# wish to use the LSTM neural net model for OCR, and finally
# (3) an OEM value, in this case, 7 which implies that we are
# treating the ROI as a single line of text
config = ("-l eng --oem 1 --psm 7")
text = pytesseract.image_to_string(roi, config=config)#here i cam receiving error
# add the bounding box coordinates and OCR'd text to the list
# of results
results.append(((startX, startY, endX, endY), text))
# sort the results bounding box coordinates from top to bottom
results = sorted(results, key=lambda r:r[0][1])
# loop over the results
for ((startX, startY, endX, endY), text) in results:
# display the text OCR'd by Tesseract
print("OCR TEXT")
print("========")
print("{}\n".format(text))
# strip out non-ASCII text so we can draw the text on the image
# using OpenCV, then draw the text and a bounding box surrounding
# the text region of the input image
text = "".join([c if ord(c) < 128 else "" for c in text]).strip()
output = orig.copy()
cv2.rectangle(output, (startX, startY), (endX, endY),
(0, 0, 255), 2)
cv2.putText(output, text, (startX, startY - 20),
cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 0, 255), 3)
# show the output image
cv2.imshow("Text Detection", output)
cv2.waitKey(0)

Resources