I am messing around with opencv2 for neural style transfer... In cv2.imshow("Output", output), I am able to say my picture. But when I write output to file with cv2.imwrite("my_file.jpg", output). Is it because my file extension is wrong? When I do like cv2.imwrite("my_file.jpg", input) though, it does show my original input picture. Any ideas? Thank you in advance.
# import the necessary packages
from __future__ import print_function
import argparse
import time
import cv2
import imutils
import numpy as np
from imutils.video import VideoStream
# construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required=True,
help="neural style transfer model")
ap.add_argument("-i", "--image", required=True,
help="input image to apply neural style transfer to")
args = vars(ap.parse_args())
# load the neural style transfer model from disk
print("[INFO] loading style transfer model")
net = cv2.dnn.readNetFromTorch(args["model"])
# load the input image, resize it to have a width of 600 pixels, and
# then grab the image dimensions
image = cv2.imread(args["image"])
image = imutils.resize(image, width=600)
(h, w) = image.shape[:2]
# construct a blob from the image, set the input, and then perform a
# forward pass of the network
blob = cv2.dnn.blobFromImage(image, 1.0, (w, h),
(103.939, 116.779, 123.680), swapRB=False, crop=False)
net.setInput(blob)
start = time.time()
output = net.forward()
end = time.time()
# reshape the output tensor, add back in the mean subtraction, and
# then swap the channel ordering
output = output.reshape((3, output.shape[2], output.shape[3]))
output[0] += 103.939
output[1] += 116.779
output[2] += 123.680
output /= 255.0
output = output.transpose(1, 2, 0)
# show information on how long inference took
print("[INFO] neural style transfer took {:.4f} seconds".format(
end - start))
# show the images
cv2.imshow("Input", image)
cv2.imshow("Output", output)
cv2.waitKey(0)
cv2.imwrite("dogey.jpg", output)
Only the last 4 lines of code have to deal with imshow and imwrite, all lines before are trying to modify the output picture.
The variable output represents a colored image that is composed of pixels. Each pixel is determined by three values (RGB). Depending on the representation of the image each value is chosen from the discrete range [0, 255] or continuous range [0, 1] either. However, in the following line of code, you are scaling the entries of output from the discrete range [0,255] to the "continuous" range [0,1].
output /= 255.0
While the function cv2.imshow(...) can handle images stored with float values in the range [0, 1] the cv2.imwrite(...) function cannot. You have to pass an image composed of values in the range [0, 255]. In your case, you are passing values that are all close to zero and "far" away from 255. Hence, the image is assumed as colorless and therefore black. A quick fix might be:
cv2.imwrite("dogey.jpg", 255*output)
Related
I have been trying to segment biological cells in an image using watershed algorithm. I found an excellent article on pyimagesearch which clearly gives an overview of the algorithm and its implementation in python. The code uses both opencv and scikit-image for processing the image.
My goal is to convert the whole code into pure opencv. But the issues is that there's a function called scipy.feature.peak_local_max in scikit-image which does the job of finding local peaks in an image very efficiently. I couldn't find or devise such function in OpenCV.
Original Code(I have documented this snippet according to my understanding, please correct if am wrong):
import the necessary packages
from skimage.feature import peak_local_max
from skimage.morphology import watershed
from scipy import ndimage
import numpy as np
import argparse
import imutils
import cv2
from matplotlib import pyplot as plt
# load the image and perform pyramid mean shift filtering
# to aid the thresholding step
image = cv2.imread("test2.png")
shifted = cv2.pyrMeanShiftFiltering(image, 21, 51)
# Apply grayscale
gray = cv2.cvtColor(shifted, cv2.COLOR_BGR2GRAY)
# Convert to binary
thresh = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
# Watershed starts from here
# compute the exact Euclidean distance from every binary
# pixel to the nearest zero pixel, then find peaks in this
# distance map
D = ndimage.distance_transform_edt(thresh)
localMax = peak_local_max(D, indices=False, min_distance=10,labels=thresh)
# perform a connected component analysis on the local peaks,
# using 8-connectivity, then appy the Watershed algorithm
markers = ndimage.label(localMax, structure=np.ones((3, 3)))[0]
# Apply segmentation
labels = watershed(-D, markers, mask=thresh)
print("[INFO] {} unique segments found".format(len(np.unique(labels)) - 1))
cv2.imwrite("labels.png",labels)
# Contouring
for label in np.unique(labels):
# if the label is zero, we are examining the 'background'
# so simply ignore it
if label == 0:
continue
# otherwise, allocate memory for the label region and draw
# it on the mask
mask = np.zeros(gray.shape, dtype="uint8")
mask[labels == label] = 255
cnts = cv2.findContours(mask.copy(), cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
cnts = imutils.grab_contours(cnts)
c = max(cnts, key=cv2.contourArea)
# draw a circle enclosing the object
peri = cv2.arcLength(c, True)
approx = cv2.approxPolyDP(c, 0.018 * peri, True)
cv2.drawContours(image, [approx], -1, (0,0,255), 2)
cv2.imwrite("output.jpg",image)
Pure OpenCV Code till finding distance map:
# import the necessary packages
import numpy as np
import cv2
# load the image and perform pyramid mean shift filtering
# to aid the thresholding step
image = cv2.imread("1.png")
shifted = cv2.pyrMeanShiftFiltering(image, 21, 51)
# Apply grayscale
gray = cv2.cvtColor(shifted, cv2.COLOR_BGR2GRAY)
# Convert to binary
thresh = cv2.threshold(gray, 0, 255,cv2.THRESH_BINARY_INV | cv2.THRESH_OTSU)[1]
# Watershed starts from here
# compute the exact Euclidean distance from every binary
# pixel to the nearest zero pixel, then find peaks in this
# distance map
D = cv2.distanceTransform(thresh,cv2.DIST_L2,0)
The point till D, both the original code and the pure opencv code which I have tried have exactly the same outputs, the issue is I dont exactly have a clear idea on how to implement peak_local_max in opencv which would give identical result as scikit's function.
It would be really helpful if someone who has relavent knowledge could explain how this function works in finding those peaks in such a fine grained manner.
Input Image:
Peak Local max output in scikit-image(BGR format image):
Required output:
I am trying to implemnt the binary threshold on my own using python, First I have converted the image to to gray scale image. Then I have taken the max value of the gay scale image and threat that value as the threshold value. Then I put nested for loop to carry out that the condition checking. below is my code. I have taken the example Image used ion openCV website.
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as im
img = im.imread('threshold.png')
plt.imshow(img)
plt.show()
R=img[:,:,0]
G=img[:,:,1]
B=img[:,:,2]
M,N=R.shape
gray_img= np.zeros((M,N))
intensity= np.zeros((M,N))
for i in range(M):
for j in range(N):
gray_img[i, j]=(R[i, j]*0.2989)+(G[i, j]*0.5870)+(B[i, j]*0.114);
t=gray_img.max()
for i in range(1, M-1):
for j in range(1,N-1):
intensity[i,j]=gray_img[i,j]
if(intensity[i,j]>t):
intensity[i,j]=1
else:
intensity[i,j]=0
plt.imshow(intensity)
plt.show()
I have been trying out Tesseract OCR in combination with Open CV (EMGUCV C#) and I am trying to improve the reliability, one the whole it's been good and by apply various filters one at a time and attempting OCR (Orignal, Bilateral, AdaptiveThreshold, Dilate) I have seem significant improvement.
However...
The following image is being stubborn, despite seeming quite clear to being with, I get no results from Tesseract (orignal image before filters):
In this case I am simply after the 2.57
Instead of using filter on the image, scaling the image did helps on the OCR. Below is the code i tried. sorry i am using linux, i test with python instead of C#
#!/usr/bin/env python3
import argparse
import cv2
import numpy as np
from PIL import Image
import pytesseract
import os
from PIL import Image, ImageDraw, ImageFilter
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True, help="Path to the image")
args = vars(ap.parse_args())
img = cv2.imread(args["image"])
#OCR
barroi = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
scale_percent = 8 # percent of original size
width = int(barroi.shape[1] * scale_percent / 100)
height = int(barroi.shape[0] * scale_percent / 100)
dim = (width, height)
barroi = cv2.resize(barroi, dim, interpolation = cv2.INTER_AREA)
text = pytesseract.image_to_string(barroi, lang='eng', config='--psm 10 --oem 3')
print(str(text))
imageName = "Result.tif"
cv2.imwrite(imageName, img)
I set myself the task of recognizing passports, but I can’t completely recognize all areas. Tell me, what can help? Used a different filtering and canny algorithm, but something is missing. Код не может распознать серию и номер документа, а также мелкие символы, иногда не может распознать имя или фамилию совсем....
# import the necessary packages
from PIL import Image
import pytesseract
import argparse
import cv2
import os
import numpy as np
# построить разбор аргументов и разбор аргументов
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image" )
ap.add_argument("-p", "--preprocess", type=str, default="thresh")
args = vars(ap.parse_args())
# загрузить пример изображения и преобразовать его в оттенки серого
image = cv2.imread ("pt.jpg")
gray = cv2.cvtColor (image, cv2.COLOR_BGR2GRAY)
gray = cv2.Canny(image,300,300,apertureSize = 3)
# check to see if we should apply thresholding to preprocess the
# image
if args["preprocess"] == "thresh":
gray = cv2.threshold (gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
# make a check to see if median blurring should be done to remove
# noise
elif args["preprocess"] == "blur":
gray = cv2.medianBlur (gray, 3)
# write the grayscale image to disk as a temporary file so we can
# apply OCR to it
filename = "{}.png".format (os.getpid ())
cv2.imwrite (filename, gray)
# load the image as a PIL/Pillow image, apply OCR, and then delete
# the temporary file
pytesseract.pytesseract.tesseract_cmd = r'C:\Program Files\Tesseract-OCR\tesseract.exe'
text = pytesseract.image_to_string (image, lang = 'rus+eng')
os.remove (filename)
print (text)
os.system('python gon.py > test.txt') # doc output file
# show the output images
cv2.imshow ("Image", image)
cv2.imshow ("Output", gray)
cv2.waitKey (0)
It's easier (and faster) for Tesseract to recognize text when you provide it only with the regions that contain the text you want to interpret, in your case the big, black letters in the middle, for example:
I'm referring to running Tesseract only in the regions in green. Since the document's structure is predictable, you could easily find these regions as follows:
binarize and invert the image (black = empty)
use opencv's connectedComponentsWithStats() function to get a list of all connected components with their positions and size
you can hardcode thresholds to filter only the characters you want, or get a histogram of areas and use statistics to define thresholds dinamically
on the remaining connected components use morphological operations (e.g. dilation with a horizontal kernel) to connect letters together horizontally
get the bounding box of the final connected components
Optional: postprocessing will work much better in these isolated boxes
Feed each bounding box as a separate Mat to tesseract, it will greatly simplify the problem.
I made a project that basically uses googles object detection api with tensorflow.
All i am doing is inference with a pre-trained model: Which means realtime object detection where the Input is the Videostream of a webcam or something similar using OpenCV.
Right now i got pretty decent performance results, but i want to further increase the FPS.
Because what i experience is that Tensorflow uses my whole Memory while Inference but the GPU Usage is not maxed out at all (around 40% with a NVIDIA GTX 1050 Laptop, and 6% on a NVIDIA Jetson Tx2).
So my idea was to increase the GPU Usage by increasing the image batch size which is fed in each session run.
So my question is: How can i Batch multiple Frames of the Input-Videostream together before i feed them to sess.run()?
Have a look at my code object_detetection.py on my github repo: (https://github.com/GustavZ/realtime_object_detection).
I would be very thankful if you come up with some Hints or Code Implementations!
import numpy as np
import os
import six.moves.urllib as urllib
import tarfile
import tensorflow as tf
import cv2
# Protobuf Compilation (once necessary)
os.system('protoc object_detection/protos/*.proto --python_out=.')
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as vis_util
from stuff.helper import FPS2, WebcamVideoStream
# INPUT PARAMS
# Must be OpenCV readable
# 0 = Default Camera
video_input = 0
visualize = True
max_frames = 300 #only used if visualize==False
width = 640
height = 480
fps_interval = 3
bbox_thickness = 8
# Model preparation
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_2017_11_17'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'
# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = 'models/' + MODEL_NAME + '/frozen_inference_graph.pb'
# List of the strings that is used to add correct label for each box.
LABEL_MAP = 'mscoco_label_map.pbtxt'
PATH_TO_LABELS = 'object_detection/data/' + LABEL_MAP
NUM_CLASSES = 90
# Download Model
if not os.path.isfile(PATH_TO_CKPT):
print('Model not found. Downloading it now.')
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
file_name = os.path.basename(file.name)
if 'frozen_inference_graph.pb' in file_name:
tar_file.extract(file, os.getcwd())
os.remove('../' + MODEL_FILE)
else:
print('Model found. Proceed.')
# Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
od_graph_def = tf.GraphDef()
with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
serialized_graph = fid.read()
od_graph_def.ParseFromString(serialized_graph)
tf.import_graph_def(od_graph_def, name='')
# Loading label map
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)
# Start Video Stream
video_stream = WebcamVideoStream(video_input,width,height).start()
cur_frames = 0
# Detection
with detection_graph.as_default():
with tf.Session(graph=detection_graph) as sess:
# Definite input and output Tensors for detection_graph
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
# Each box represents a part of the image where a particular object was detected.
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
# Each score represent how level of confidence for each of the objects.
# Score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
# fps calculation
fps = FPS2(fps_interval).start()
print ("Press 'q' to Exit")
while video_stream.isActive():
image_np = video_stream.read()
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
np.squeeze(boxes),
np.squeeze(classes).astype(np.int32),
np.squeeze(scores),
category_index,
use_normalized_coordinates=True,
line_thickness=bbox_thickness)
if visualize:
cv2.imshow('object_detection', image_np)
# Exit Option
if cv2.waitKey(1) & 0xFF == ord('q'):
break
else:
cur_frames += 1
if cur_frames >= max_frames:
break
# fps calculation
fps.update()
# End everything
fps.stop()
video_stream.stop()
cv2.destroyAllWindows()
print('[INFO] elapsed time (total): {:.2f}'.format(fps.elapsed()))
print('[INFO] approx. FPS: {:.2f}'.format(fps.fps()))
Well, I'd just collect batch_size frames and feed them:
batch_size = 5
while video_stream.isActive():
image_np_list = []
for _ in range(batch_size):
image_np_list.append(video_stream.read())
fps.update()
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.asarray(image_np_list)
# Actual detection.
(boxes, scores, classes, num) = sess.run(
[detection_boxes, detection_scores, detection_classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
# Visualization of the results of a detection.
for i in range(batch_size):
vis_util.visualize_boxes_and_labels_on_image_array(
image_np_expanded[i],
boxes[i],
classes[i].astype(np.int32),
scores[i],
category_index,
use_normalized_coordinates=True,
line_thickness=bbox_thickness)
if visualize:
cv2.imshow('object_detection', image_np_expanded[i])
# Exit Option
if cv2.waitKey(1) & 0xFF == ord('q'):
break
Of course you'll have to make the relevant changes after that if you're reading the results from the detection, since they will now have batch_size rows.
Be careful though: before tensorflow 1.4 (I think), the object detection API only supports batch size of 1 in image_tensor, so this will not work unless you upgrade your tensorflow.
Also note that your resulting FPS will be an average, but that the frames in a same batch will actually be closer in time than between different batches (since you'll still need to wait for the sess.run() to finish). The average should still be significantly better than your current FPS, although the max time between two consecutive frames should increase.
If you want your frames to all have roughly the same interval between them, I guess you'll need more sophisticated tools like multithreading and queueing: one thread would read the images from the stream and store them in a queue, the other one would take them from the queue and call sess.run() on them asynchronously; it could also tell the 1st thread to hurry up or slow down depending on its own computing capacity. This is trickier to implement.