I'm trying to Stitching image wide angle 160.5 degree but the result is not a good
i'm using OpenCV 4 and ffmpeg to get frames from video
ffmpeg command to get 15 frame per sec :
ffmpeg -i first.mp4 -vf fps=15 preview%05d.jpg
OpenCV Stitching code
import cv2
import numpy as np
images = []
for i in range(70):
name = ('preview%05d.jpg' % (i+1))
print(name)
images.append(cv2.imread(name , cv2.IMREAD_COLOR))
print("start ")
stitcher = cv2.Stitcher_create()
ret, pano = stitcher.stitch(images)
if ret == cv2.STITCHER_OK:
cv2.imshow('panorama', pano)
cv2.waitKey()
cv2.destroyAllWindows()
else:
print(cv2.STITCHER_ERR_NEED_MORE_IMGS)
print(cv2.STITCHER_ERR_HOMOGRAPHY_EST_FAIL)
print(cv2.STITCHER_ERR_CAMERA_PARAMS_ADJUST_FAIL)
print(ret)
print('Error during stiching')
actual result :
expected result :
Before the code line stitcher = cv2.Stitcher_create() you have to append some more algorithms that transform your trapezoid image view into a rectangle image view via the homography method.
use: cv2.findHomography(srcPoints, dstPoints[, method[, ransacReprojThreshold[, mask]]])
srcPoints – Coordinates of the points in the original plane, a matrix of the type CV_32FC2 or vector .
dstPoints – Coordinates of the points in the target plane, a matrix of the type CV_32FC2 or a vector .
See also here for findHomography at OpenCV.
In particular: in your case the base (bottom side of the image) shows most information whereas topside has more non relevant information. Here you should keep the aspect ratio topside the same and narrow the bottom. This should be done for every image. Once done you can try stitching them again.
Approach example to transform Trapezium based image information in e.g. square image:
(information ratio x)
----+++++++---- (1)
---+++++++++--- (1)
--+++++++++++-- (1)
-+++++++++++++- (1)
+++++++++++++++ (1)
into Squared image information:
(information ratio x)
----+++++++---- (1)
----+++++++---- (1.1)
----+++++++---- (1.2)
----+++++++---- (1.3)
----+++++++---- (1.4; most compressed information ratio)
Once this is done you can stitch it. Don't forget to post the result ;-)
Another approach is to treat the camera as a line-inspector. This method you use when you take information from each image for lets say line y1060 to 1080 (e.g. image size 1920x1080px) and then fill a new array with the information from those 20 lines in ascending order.
Update Jan 2019:
As homography appears not to do 100% the job due to the steep 60 degree angle you can try to correct the angle by performing the PerspectiveTransform first.
# you can add a for-loop + image counter here to perform action on all images taken from
# the movie-file. Then its easily replacing numbers in the next part for the name
# of the image.
scr_array = [] # source e.g. pts1 = np.float32([[56,65],[368,52],[28,387],[389,390]])
dest_array = [] # destination e.g. pts2 = np.float32([[0,0],[300,0],[0,300],[300,300]])
Matrix1 = cv2.getPerspectiveTransform(scr_array,dest_array)
dst = cv2.warpPerspective(image, Matrix1 , (cols, rows))
label = 'CarImage1' # use ('CarImage%s' % labelnr) here for automated annotation.
# cv2.imshow(label , dst) # check the image
# cv2.imwrite(('%s.jpg' % label), dst)
See also the docs here on PerspectiveTransform.
Stither is expecting images that have similar parts (up to some perspective transformation). It performs pairwise image registration to find this perspective transform. In your case it won't be able to find it because it simply does not exists.
Additinal step that you must perform prior to stitcher - rectify each image to correct wide angle disportion. To find rectification parameters you will need to do some camera calibrations with calibration targets.
Related
I have a color image, corresponding point cloud captured by oak-D camera(see the image below) and i want to get the information of pixels in the color image and corresponding point cloud value in point cloud.
how can i get this information? for instance, i have a pixel value (200,250) in the color image and how to know the corresponding point value in the point cloud?
any help would be appreciated.
It sounds like you want to project a 2D image to a 3D point cloud using the computed disparity map. To do this you will also need to know about your camera intrinsics. Since you are using the oak-D, you should be able to get everything you need with the following piece of code.
with dai.Device(pipeline) as device:
calibData = device.readCalibration()
# get right intrinsic matrix
w, h = monoRight.getResolutionSize()
K_right = calibData.getCameraIntrinsics(dai.CameraBoardSocket.RIGHT, dai.Size2f(w, h))
# get left intrinsic matrix
w, h = monoLeft.getResolutionSize()
K_left = calibData.getCameraIntrinsics(dai.CameraBoardSocket.LEFT, dai.Size2f(w, h))
R_left = calibData.getStereoLeftRectificationRotation()
R_right = calibData.getStereoRightRectificationRotation()
x_baseline = calibData.getBaselineDistance()
Once you have all you camera parameters, you should be able to use opencv to approach this.
First you will need to construct the Q matrix (or rectified transformation matrix).
You will need to provide
The left and right intrinsic calibration matrices
The Translation vector from the coordinate system of the first camera to the second camera
The Rotation matrix from the coordinate system of the first camera to the second camera
Here's a coded example:
import numpy as np
import cv2
Q = np.zeros((4,4))
cv2.stereoRectify(cameraMatrix1=K_left, # left intrinsic matrix
cameraMatrix2=K_right, # right intrinsic matrix
distCoeffs1=0,
distCoeffs2=0,
imageSize=imageSize, # pass in the image size
R=R_left, # Rotation matrix from camera 1 to camera 2
T=x_baseline, # Translation matrix from camera 1 to camera 2
R1=None,
R2=None,
P1= None,
P2= None,
Q=Q);
Next you will need to reproject the image to 3D, using the known disparity map and the Q matrix. The operation is illustrated below, but opencv makes this much easier.
xyz = cv2.reprojectImageTo3D(disparity, Q)
This will give you an array of 3D points. This array specifically has the shape: (rows, columns, 3), where the 3 corresponds the (x,y,z) coordinate of the point cloud. Now you can use the a pixel location to index into xyz and find it's corresponding (x, y, z) point.
pix_row = 200
pix_col = 250
point_cloud_coordinate = xyz[pix_row, pix_col, :]
See the docs for more details
cv2.stereoRectify()
cv2.reprojectImageTo3D()
I've been recently working at a segmentation process for corneal
endothelial cells, and I've found a pretty decent paper that describes ways to perform it with nice results. I have been trying to follow that paper and implement it all using scikit-image and openCV, but I've gotten stucked at the watershed segmentation.
I will briefly describe how is the process supposed to be:
First of all, you have the original endothelial cells image
original image
Then, they instruct you to perform a morphological grayscale reconstruction, in order to level a little bit the grayscale of the image (however, they do not explain how to get the markers for the grayscale, so I've been fooling around and tried to get some on my own way)
This is what the reconstructed image was supposed to look like:
desired reconstruction
This is what my reconstructed image (lets label it as r) looks like:
my reconstruction
The purpose is to use the reconstructed image to get the markers for the watershed segmentation, how do we do that?! We get the original image (lets label it as f), and perform a threshold in (f - r) to extract the h-domes of the cell, i.e., our markers.
This is what the hdomes image was supposed to look like:
desired hdomes
This is what my hdomes image looks like:
my hdomes
I believe that the hdomes I've got are as good as theirs, so, the final step is to finally perform the watershed segmentation on the original image, using the hdomes we've been working so hard to get!
As input image, we will use the inverted original image, and as markers, our markers.
This is the derised output:
desired output
However, I am only getting a black image, EVERY PIXEL IS BLACK and I have no idea of what's happening... I've also tried using their markers and inverted image, however, also getting black image. The paper I've been using is Luc M. Vincent, Barry R. Masters, "Morphological image processing and network analysis of cornea endothelial cell images", Proc. SPIE 1769
I apologize for the long text, however I really wanted to explain everything in detail of what is my understanding so far, btw, I've tried watershed segmentation from both scikit-image and opencv, both gave me the black image.
Here is the following code that I have been using
img = cv2.imread('input.png',0)
mask = img
marker = cv2.erode(mask, cv2.getStructuringElement(cv2.MORPH_ERODE,(3,3)), iterations = 3)
reconstructedImage = reconstruction(marker, mask)
hdomes = img - reconstructedImage
cell_markers = cv2.threshold(hdomes, 0, 255, cv2.THRESH_BINARY)[1]
inverted = (255 - img)
labels = watershed(inverted, cell_markers)
cv2.imwrite('test.png', labels)
plt.figure()
plt.imshow(labels)
plt.show()
Thank you!
Here's a rough example for the watershed segmentation of your image with scikit-image.
What is missing in your script is calculating the Euclidean distance (see here and here) and extracting the local maxima from it.
Note that the watershed algorithm outputs a piece-wise constant image where pixels in the same regions are assigned the same value. What is shown in your 'desired output' panel (e) are the edges between the regions instead.
import numpy as np
import cv2
import matplotlib.pyplot as plt
from skimage.morphology import watershed
from scipy import ndimage as ndi
from skimage.feature import peak_local_max
from skimage.filters import threshold_local
img = cv2.imread('input.jpg',0)
'''Adaptive thersholding
calculates thresholds in regions of size block_size surrounding each pixel
to handle the non-uniform background'''
block_size = 41
adaptive_thresh = threshold_local(img, block_size)#, offset=10)
binary_adaptive = img > adaptive_thresh
# Calculate Euclidean distance
distance = ndi.distance_transform_edt(binary_adaptive)
# Find local maxima of the distance map
local_maxi = peak_local_max(distance, labels=binary_adaptive, footprint=np.ones((3, 3)), indices=False)
# Label the maxima
markers = ndi.label(local_maxi)[0]
''' Watershed algorithm
The option watershed_line=True leave a one-pixel wide line
with label 0 separating the regions obtained by the watershed algorithm '''
labels = watershed(-distance, markers, watershed_line=True)
# Plot the result
plt.imshow(img, cmap='gray')
plt.imshow(labels==0,alpha=.3, cmap='Reds')
plt.show()
In my app, I will input a human image and I want to get the face and neck only of that person as output in separate image. Example: Below image as input:(Source:http://www.fremantlepress.com.au)
And I want to get the up image as output:
I want to perform the following algorithm:
1. Detect face
2. Select (face region * 2) area
3. Detect skin and neck
4. Cut the skin region of the selected image
5. Save that cut region into a new image
As going through the EmguCV wiki and other online resources, I am confident to perform the step 1 and 2. But I am not sure how can I accomplish step 3 and 4.
There are some functions/methods I am looking on (Cunny Edge Detection, Contour etc) but I am not sure how and where should I apply those methods.
I am using EmguCV (C#) and Windows Form Application.
Please help me how can I do step 3 and 4. I will be glad if someone elaborate these two steps and some code also.
Well there are several ways you could approach this. Edge detection will only give you a binary image of edges and you will have to perform some line tracing or Hough transforms to detect the location of these. There accuracy will vary.
I will assume for know that you can detect the eyes and the relative location of the face. I would expect a statistical filter would provide a favourable outcome with better performance than a neural network which is the best alternative. A good Alternative is naturally colour segmentation if colour images are used (This is far easier to implement). I will also assume that the head position can change slightly with the neck being more or less visible within an image.
So for a Statistical Filter:
(Note that the background of the individual is similar to the face data when dealing with a greyscale image so a colour image would be better to work with).
Take a blank copy of our original image. We will form a binary map of our face on this while not
necessary it will allow us to examine our success easier
Find the Face, Eyes and Mouth in the original image.
We can assume that any data from the eyes and mouth form part of the face and mark these on the
blank copy with "1"s.
Now we need a bit of maths, as we know the face detection algorithm can only detect a face at a
certain angle to the camera. We use this and select a statistical mask from the image of certain
parts from the image let’s say 10x10 pixels 2 or 3 from the cheek area. This will be the most
likely area of the face within the image. We use this data and get values from the image such as
mean and standard deviation.
We now scan across the segmented part of the image where we have detected the face. We won't do
the whole image as this will take a long period of time. (Note: There is a border half the size
of the mask that won't be looked at). We examine each pixel and it surrounding neighbours to the
size of the 10x10 mask. If the average or standard deviation (whatever we are examining) is
similar to that of our filter say within 10% then we mark this pixel in our blank copy as a "1"
and consider that pixel to belong to the skin.
As for Colour Segmentation:
(Note: You could also try this process for greyscale however it will be less successful due to the brickwork)
Repeat steps 1 to 2.
Again we will select certain areas of the image that we can expect to contain face data (i.e. 10
pixels below the eye). In this case however we examine the data that forms the colour of this
pixel. Don't forget HSV images can obtain better results from this process an a combination more
so. We can the scan across the image examining each pixel for a similar colour. If it matches
mark it on your binary map.
An alternative is subtracting or adding a calculated from the R G and B spectrum of the image of
which only the data face will survive. You can convert this directly to a binary image by
making any value > 1 == 1;
This will only work for Skin as for the hair we will need other filters. A few notes:
A statistical filter working on a colour image has a far greater ability however takes longer.
Use data from the image to form your statistical filter as this will allow for other skin colours to be classified. A mathematical designed filter or colour segmentation will require a lot of work to achieve the same variability.
The size of the mask is important the greater the mask size the less likely errors will occur but again processing time increases.
You can speed up the process by referencing the same area within the binary map copy if the pixel your examining is already a 1 (classified by eye/nose/mouth detection) then why examine it again just skip it.
Multiple skin filters will provide better results however may also introduce more noise and remember each filter must then by compared with a pixel increasing processing time.
To get an lgorithm working accuratley will require a bit of trial and error but you sould see comparable results fairly quickly using these methods.
I hope this helps you on your way. Sorry for not including any code but hopefully others can help you were you get stuck and writing it yourself will help you understand what is going on and allow you to cut down on processing time. Let me know if you require any additional advice I'm doing my PhD in image analysis just so you know that the advice is sound.
Take Care
Chris
[EDIT]
Some quick results:
Here is a 20x20 filter applied in detecting the hair. The program I've written only works on greyscale images at the moment so the skin detection suffers interference from the stone (see later)
Colour Image of Face Region
Binary Map of Average Hair Filter 20x20 Mask 40% Error allowed
As can be observed there is interference from the shirt in this case as it matches the colour of the hair. This can be eliminated by simply only examining the top third or half of the detected facial region.
Binary Map of Average Skin Filter 20x20 Mask 40% Error allowed
In this image I use only 1 filter formed from the chin area as the stubble obviously changes the filters behaviour. There is still noise presented from the stone behind the individual however using colour image could eliminate this. The gaps in the case could be filled by an algorithm or another filter. Again there is noise from the edge of the shirt but we could minimise this either by detecting the shirt and removing any data that forms it or dimply only looking in certain areas.
Examples of the Regions to Inspect
To eliminate false classification you could take the top two thirds of the segmented image and look for the face and the width of the detected eyes to the bottom of the facial region for neck data.
Cheers Again
Chris
Hello Chris Can you share the codes for the same. Actually I have used grabcut algorithm to crop the face upto neck but the accuracy of images is not perfect. I am sharing the code where i am using webcam to capture images and then blurring the background and using grabcut algorithm. Please check it and reply.
import numpy as np
import cv2
import pixellib
from pixellib.tune_bg import alter_bg
rect = (0,0,0,0)
startPoint = False
endPoint = False
img_counter = 0
# function for mouse callback
def on_mouse(event,x,y,flags,params):
global rect,startPoint,endPoint
# get mouse click
if event == cv2.EVENT_LBUTTONDOWN:
if startPoint == True and endPoint == True:
startPoint = False
endPoint = False
rect = (0, 0, 0, 0)
if startPoint == False:
rect = (x, y, 0, 0)
startPoint = True
elif endPoint == False:
rect = (rect[0], rect[1], x, y)
endPoint = True
#cap = cv2.VideoCapture("YourVideoFile.mp4")
#cap = cv2.imread("/home/mongoose/Projects/background removal/bg_grabcut/GrabCut-from-video-master/IMG_6471.jpg")
#capturing the camera feed, '0' denotes the first camera connected to the computer
cap = cv2.VideoCapture(0)
waitTime = 50
change_bg = alter_bg(model_type = "pb")
change_bg.load_pascalvoc_model("/home/mongoose/Projects/background removal/bg_grabcut/test/xception_pascalvoc.pb")
change_bg.blur_camera(cap, extreme = True, frames_per_second= 10, output_video_name= "output_video.mp4", show_frames= True, frame_name= "frame", detect = "person")
#Reading the first frame
(grabbed, frame) = cap.read()
while(cap.isOpened()):
(grabbed, frame) = cap.read()
cv2.namedWindow('frame')
cv2.setMouseCallback('frame', on_mouse)
#drawing rectangle
if startPoint == True and endPoint == True:
cv2.rectangle(frame, (rect[0], rect[1]), (rect[2], rect[3]), (255, 0, 255), 2)
if not grabbed:
break
cv2.imshow('frame',frame)
key = cv2.waitKey(waitTime)
if key == ord('q'):
#esc pressed
break
elif key % 256 == 32:
# SPACE pressed
alpha = 1 # Transparency factor.
img_name = "opencv_frame_{}.png".format(img_counter)
imgCopy = frame.copy()
img = frame
mask = np.zeros(img.shape[:2], np.uint8)
bgdModel = np.zeros((1, 65), np.float64)
fgdModel = np.zeros((1, 65), np.float64)
w = abs(rect[0]-rect[2]+10)
h= abs(rect[1]-rect[3]+10)
rect2 = (rect[0]+10, rect[1]+10,w ,h )
cv2.grabCut(img, mask, rect2, bgdModel, fgdModel, 100, cv2.GC_INIT_WITH_RECT)
mask2 = np.where((mask == 2) | (mask == 0), 0, 1).astype('uint8')
img = img * mask2[:, :, np.newaxis]
cv2.imwrite(img_name, img )
print("{} written!".format(img_name))
img_counter += 1
cap.release()
cv2.destroyAllWindows()
I need an algorithm written in any language to find an image inside of an image, including at different scales. Does anyone know a starting point to solving a problem like this?
For example:
I have an image of 800x600 and in that image is a yellow ball measuring 180 pixels in circumference. I need to be able to find this image with a search pattern of a yellow ball having a circumference of 15 pixels.
Thanks
Here's an algorithm:
Split the image into RGB and take the blue channel. You will notice that areas that were yellow in the color image are now dark in the blue channel. This is because blue and yellow are complementary colors.
Invert the blue channel
Create a greyscale search pattern with a circle that's the same size as what's in the image (180 pixels in circumference). Make it a white circle on a black background.
Calculate the cross-correlation of the search pattern with the inverted blue channel.
The cross-correlation peak will correspond to the location of the ball.
Here's the algorithm in action:
RGB and R:
G and B:
Inverted B and pattern:
Python + OpenCV code:
import cv
if __name__ == '__main__':
image = cv.LoadImage('ball-b-inv.png')
template = cv.LoadImage('ball-pattern-inv.png')
image_size = cv.GetSize(image)
template_size = cv.GetSize(template)
result_size = [ s[0] - s[1] + 1 for s in zip(image_size, template_size) ]
result = cv.CreateImage(result_size, cv.IPL_DEPTH_32F, 1)
cv.MatchTemplate(image, template, result, cv.CV_TM_CCORR)
min_val, max_val, min_loc, max_loc = cv.MinMaxLoc(result)
print max_loc
Result:
misha#misha-desktop:~/Desktop$ python cross-correlation.py
(72, 28)
This gives you the top-left co-ordinate of the first occurence of the pattern in the image. Add the radius of the circle to both x and y co-ordinates if you want to find the center of the circle.
You should take a look at OpenCV, an open source computer vision library - this would be a good starting point. Specifically check out object detection and the cvMatchTemplate method.
a version of one of previous posts made with opencv 3 and python 3
import cv2
import sys
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(cv2.matchTemplate(cv2.imread(sys.argv[1]),cv2.imread(sys.argv[2]),cv2.TM_CCOEFF_NORMED))
print(max_loc)
save as file.py and run as:
python file.py image pattern
A simple starting point would be the Hough transform, if you want to find circles.
However there is a whole research area arount this subject called object detection and recognition. The state of the art has advanced significantly the past decade.
OpenCV users know that cvRemap is used for doing geometric transformations.
The mapx and mapy arguments are the data structures which give the mapping
information in the destination image.
Can I create two integer arrays holding random values from 1 to 1024 or from 1 to 768
if I deal with images (1024 X 768)
And then make mapx and mapy assigned with these values?
And then use them in cvRemap()?
Will it do the job or the only way to use mapx and mapy is get its value assigned by using the function cvUndistortMap()?
I want to know because I want to warp the images.
Just in case to tell you that I have already checked out Learning OpenCV book too.
I use cvRemap to apply distortion correction.
The map_x part is in image resolution and stores for each pixel the x-offset to be applied, while map_y part is the same for the y-offset.
in case of undistortion
# create map_x/map_y
self.map_x = cvCreateImage(cvGetSize(self.image), IPL_DEPTH_32F, 1)
self.map_y = cvCreateImage(cvGetSize(self.image), IPL_DEPTH_32F, 1)
# I know the camera intrisic already so create a distortion map out
# of it for each image pixel
# this defined where each pixel has to go so the image is no longer
# distorded
cvInitUndistortMap(self.intrinsic, self.distortion, self.map_x, self.map_y)
# later in the code:
# "image_raw" is the distorted image, i want to store the undistorted into
# "self.image"
cvRemap(image_raw, self.image, self.map_x, self.map_y)
Therefore: map_x/map_y are floating point values and in image resolution, like two images in 1024x768. What happens in cvRemap is basicly something like
orig_pixel = input_image[x,y]
new_x = map_x[x,y]
new_y = map_y[x,y]
output_image[new_x,new_y] = orig_pixel
What kind of geometric transformations do you want to do with this?