Fine tuning image stitching in OpenCV

Fine tuning image stitching in OpenCV - opencv

(A newbie in computer vision)
Goal is to reconstitute a game level using image stitching or any other method. The level played by someone is video-recorded, these frames will be the input.
Expected result : (level 4-4 of SMB from http://www.vgmaps.com/)
This is my first attempt at addressing this problem, using OpenCV (EmguCV). So far results are excellent but I was wondering whether there are more appropriate techniques knowing that my input will be strictly in 2D ?
I am open to try another framework/technique providing it's not overly complex.
Here are source images :
Result of the 7 first images :
(for some reasons, the Stitcher in OpenCV did not accept 10 at once ...)
Result of the last 3 images :

Related

OpenCV: PNP pose estimation fails in a specific case

I am using OpenCV's solvePnPRansac function to estimate the pose of my camera given a pointcloud made from tracked features. My pipeline consists of multiple cameras where I form the point cloud from matched features between two cameras, and use that as a reference to estimate the pose of one of the cameras as it starts moving. I have tested this in multiple settings and it works as long as there are enough features to track while the camera is in motion.
Strangely, during a test I did today, I encountered a failure case where solvePnP would just return junk values all the time. What's confusing here is that in this data set, my point cloud is much denser, it's reconstructed pretty accurately from the two views, the tracked number of points (currently visible features vs. features in the point cloud) at any given time was much higher than what I usually have, so theoretically it should have been a breeze for solvePnP, yet it fails terribly.
I tried with CV_ITERATIVE, CV_EPNP and even the non RANSAC version of solvePnP. I was just wondering if I am missing something basic here? The scene I am looking at can be seen in these images (image 1 is the scene and feature matches between two perspectives, image 2 is the point cloud for reference)
The part of the code doing PNP is pretty simple. If P3D is the array of tracked 3Dpoints, P2D is the corresponding set of image points,
solvePnpRansac(P3D, P2D, K, d, R, T, false, 500, 2.0, 100, noArray(), CV_ITERATIVE);
EDIT: I should also mention that my reference poincloud was obtained with a baseline of 8 feet between the cameras, whereas the building I am looking at was probably like a 100 feet away. Could the possible lack of disparity cause issues as well?

Improve image quality

I need to improve image quality, from low quality to high hd quality. I am using OpenCV libraries. I experimented a lot with GaussianBlur(), Laplacian(), transformation functions, filter functions etc, but all I could succeed is to convert image to hd resolution and keep the same quality. Is it possible to do this? Do I need to implement my own algorithm or is there a way how it's done? I will really appreciate any kind of help. Thanks in advance.

I used this link for my reference. It has other interesting filters that you can play with.
If you are using C++:
detailEnhance(Mat src, Mat dst, float sigma_s=10, float sigma_r=0.15f)
If you are using python:
dst = cv2.detailEnhance(src, sigma_s=10, sigma_r=0.15)
The variable 'sigma_s' determines how big the neighbourhood of pixels must be to perform filtering.
The variable 'sigma_r' determines how the different colours within the neighbourhood of pixels will be averaged with each other. Its range is from: 0 - 1. A smaller value means similar colors will be averaged out while different colors remain as they are.
Since you are looking for sharpness in the image, I would suggest you keep the kernel as minimum as possible.
Here is the result I obtained for a sample image:
1. Original image:
2. Sharpened image for lower sigma_r value:
3. Sharpened image for higher sigma_r value:
Check the above mentioned link for more information.

How about applying Super Resolution in OpenCV? A reference article with more details can be found here: https://learnopencv.com/super-resolution-in-opencv/
So basically you will need to have the Python dependency opencv-contrib-python installed, together with a working version of opencv-python.
There are different techniques for the Super Resolution in OpenCV you can choose from, including EDSR, ESPCN, FSRCNN, and LapSRN. Code examples in both Python and C++ have been included in the tutorial article as well for easy reference.

A correction is needed
dst = cv2.detailEnhance(src, sigma_s=10, sigma_r=0.15)
using kernel will give error.

+1 to kris stern answer,
If you are looking for practical implementation of super resolution using pretrained model in OpenCV, have a look at below notebook also video describing details.
https://github.com/pankajr141/experiments/blob/master/Reasoning/ComputerVision/super_resolution_enhancing_image_quality_using_pretrained_models.ipynb
https://www.youtube.com/watch?v=JrWIYWO4bac&list=UUplf_LWNn0a9ubnKCZ-95YQ&index=4
Below is a sample code using opencv
model_pretrained = cv2.dnn_superres.DnnSuperResImpl_create()
# setting up the model initialization
model_pretrained.readModel(filemodel_filepath)
model_pretrained.setModel(modelname, scale)
# prediction or upscaling
img_upscaled = model_pretrained.upsample(img_small)

Determine skeleton joints with a webcam (not Kinect)

I'm trying to determine skeleton joints (or at the very least to be able to track a single palm) using a regular webcam. I've looked all over the web and can't seem to find a way to do so.
Every example I've found is using Kinect. I want to use a single webcam.
There's no need for me to calculate the depth of the joints - I just need to be able to recognize their X, Y position in the frame. Which is why I'm using a webcam, not a Kinect.
So far I've looked at:
OpenCV (the "skeleton" functionality in it is a process of simplifying graphical models, but it's not a detection and/or skeletonization of a human body).
OpenNI (with NiTE) - the only way to get the joints is to use the Kinect device, so this doesn't work with a webcam.
I'm looking for a C/C++ library (but at this point would look at any other language), preferably open source (but, again, will consider any license) that can do the following:
Given an image (a frame from a webcam) calculate the X, Y positions of the visible joints
[Optional] Given a video capture stream call back into my code with events for joints' positions
Doesn't have to be super accurate, but would prefer it to be very fast (sub-0.1 sec processing time per frame)
Would really appreciate it if someone can help me out with this. I've been stuck on this for a few days now with no clear path to proceed.
UPDATE
2 years later a solution was found: http://dlib.net/imaging.html#shape_predictor

To track a hand using a single camera without depth information is a serious task and topic of ongoing scientific work. I can supply you a bunch of interesting and/or highly cited scientific papers on the topic:
M. de La Gorce, D. J. Fleet, and N. Paragios, “Model-Based 3D Hand Pose Estimation from Monocular Video.,” IEEE transactions on pattern analysis and machine intelligence, vol. 33, Feb. 2011.
R. Wang and J. Popović, “Real-time hand-tracking with a color glove,” ACM Transactions on Graphics (TOG), 2009.
B. Stenger, A. Thayananthan, P. H. S. Torr, and R. Cipolla, “Model-based hand tracking using a hierarchical Bayesian filter.,” IEEE transactions on pattern analysis and machine intelligence, vol. 28, no. 9, pp. 1372–84, Sep. 2006.
J. M. Rehg and T. Kanade, “Model-based tracking of self-occluding articulated objects,” in Proceedings of IEEE International Conference on Computer Vision, 1995, pp. 612–617.
Hand tracking literature survey in the 2nd chapter:
T. de Campos, “3D Visual Tracking of Articulated Objects and Hands,” 2006.
Unfortunately I don't know about some freely available hand tracking library.

there is a simple way for detecting hand using skin tone. perhaps this could help... you can see the results on this youtube video. caveat: the background shouldn't contain skin colored things like wood.
here is the code:
''' Detect human skin tone and draw a boundary around it.
Useful for gesture recognition and motion tracking.
Inspired by: http://stackoverflow.com/a/14756351/1463143
Date: 08 June 2013
'''
# Required moduls
import cv2
import numpy
# Constants for finding range of skin color in YCrCb
min_YCrCb = numpy.array([0,133,77],numpy.uint8)
max_YCrCb = numpy.array([255,173,127],numpy.uint8)
# Create a window to display the camera feed
cv2.namedWindow('Camera Output')
# Get pointer to video frames from primary device
videoFrame = cv2.VideoCapture(0)
# Process the video frames
keyPressed = -1 # -1 indicates no key pressed
while(keyPressed < 0): # any key pressed has a value >= 0
# Grab video frame, decode it and return next video frame
readSucsess, sourceImage = videoFrame.read()
# Convert image to YCrCb
imageYCrCb = cv2.cvtColor(sourceImage,cv2.COLOR_BGR2YCR_CB)
# Find region with skin tone in YCrCb image
skinRegion = cv2.inRange(imageYCrCb,min_YCrCb,max_YCrCb)
# Do contour detection on skin region
contours, hierarchy = cv2.findContours(skinRegion, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
# Draw the contour on the source image
for i, c in enumerate(contours):
area = cv2.contourArea(c)
if area > 1000:
cv2.drawContours(sourceImage, contours, i, (0, 255, 0), 3)
# Display the source image
cv2.imshow('Camera Output',sourceImage)
# Check for user input to close program
keyPressed = cv2.waitKey(1) # wait 1 milisecond in each iteration of while loop
# Close window and camera after exiting the while loop
cv2.destroyWindow('Camera Output')
videoFrame.release()
the cv2.findContour is quite useful, you can find the centroid of a "blob" by using cv2.moments after u find the contours. have a look at the opencv documentation on shape descriptors.
i havent yet figured out how to make the skeletons that lie in the middle of the contour but i was thinking of "eroding" the contours till it is a single line. in image processing the process is called "skeletonization" or "morphological skeleton". here is some basic info on skeletonization.
here is a link that implements skeletonization in opencv and c++
here is a link for skeletonization in opencv and python
hope that helps :)
--- EDIT ----
i would highly recommend that you go through these papers by Deva Ramanan (scroll down after visiting the linked page): http://www.ics.uci.edu/~dramanan/
C. Desai, D. Ramanan. "Detecting Actions, Poses, and Objects with
Relational Phraselets" European Conference on Computer Vision
(ECCV), Florence, Italy, Oct. 2012.
D. Park, D. Ramanan. "N-Best Maximal Decoders for Part Models" International Conference
on Computer Vision (ICCV) Barcelona, Spain, November 2011.
D. Ramanan. "Learning to Parse Images of Articulated Objects" Neural Info. Proc.
Systems (NIPS), Vancouver, Canada, Dec 2006.

The most common approach can be seen in the following youtube video. http://www.youtube.com/watch?v=xML2S6bvMwI
This method is not quite robust, as it tends to fail if the hand is rotated to much (eg; if the camera is looking at the side of the hand or at a partially bent hand).
If you do not mind using two camera's you can look into the work Robert Wang. His current company (3GearSystems) uses this technology, augmented with a kinect, to provide tracking. His original paper uses two webcams but has much worse tracking.
Wang, Robert, Sylvain Paris, and Jovan Popović. "6d hands: markerless hand-tracking for computer aided design." Proceedings of the 24th annual ACM symposium on User interface software and technology. ACM, 2011.
Another option (again if using "more" than a single webcam is possible), is to use a IR emitter. Your hand reflects IR light quite well whereas the background does not. By adding a filter to the webcam that filters normal light (and removing the standard filter that does the opposite) you can create a quite effective hand tracking. The advantage of this method is that the segmentation of the hand from the background is much simpler. Depending on the distance and the quality of the camera, you would need more IR leds, in order to reflect sufficient light back into the webcam. The leap motion uses this technology to track the fingers & palms (it uses 2 IR cameras and 3 IR leds to also get depth information).
All that being said; I think the Kinect is your best option in this. Yes, you don't need the depth, but the depth information does make it a lot easier to detect the hand (using the depth information for the segmentation).

My suggestion, given your constraints, would be to use something like this:
http://docs.opencv.org/doc/tutorials/objdetect/cascade_classifier/cascade_classifier.html
Here is a tutorial for using it for face detection:
http://opencv.willowgarage.com/wiki/FaceDetection?highlight=%28facial%29|%28recognition%29
The problem you have described is quite difficult, and I'm not sure that trying to do it using only a webcam is a reasonable plan, but this is probably your best bet. As explained here (http://docs.opencv.org/modules/objdetect/doc/cascade_classification.html?highlight=load#cascadeclassifier-load), you will need to train the classifier with something like this:
http://docs.opencv.org/doc/user_guide/ug_traincascade.html
Remember: Even though you don't require the depth information for your use, having this information makes it easier for the library to identify a hand.

At last I've found a solution. Turns out a dlib open-source project has a "shape predictor" that, once properly trained, does exactly what I need: it guesstimates (with a pretty satisfactory accuracy) the "pose". A "pose" is loosely defined as "whatever you train it to recognize as a pose" by training it with a set of images, annotated with the shapes to extract from them.
The shape predictor is described in here on dlib's website

I don't know about possible existing solutions. If supervised (or semi-supervised) learning is an option, training decision trees or neural networks might already be enough (kinect uses random forests from what i have heard). Before you go such a path, do everything you can to find an existing solution. Getting Machine Learning stuff right takes a lot of time and experimentation.
OpenCV has machine learning components, what you would need is training data.

With the motion tracking features of the open source Blender project it is possible to create a 3D model based on 2D footage. No kinect needed. Since blender is open source you might be able to use their pyton scripts outside the blender framework for your own purposes.

Have you ever heard about Eyesweb
I have been using it for one of my project and I though it might be usefull for what you want to achieve.
Here are some interesting publication LNAI 3881 - Finger Tracking Methods Using EyesWeb and Powerpointing-HCI using gestures
Basically the workflow is:
You create your patch in EyesWeb
Prepare the datas you want to send with a network client
Use theses processed datas on your own server (your app)
However, I don't know if there is a way to embed the real time image processing part of Eyes Web into a soft as a library.

OpenCV : Building a simple 3d model

I Have decided to use OpenCV to build a 3d scene by using a series of 2D Images. I found the example code that came with OpenCV [ build3dmodel.cpp Here ].
I just want to run this once and see what kind of outcome this gives. My knowledge with OpenCV is low, I don't want to understand the whole code, I just want to know how to give inputs to this program (the image set) to see the output.

The line command of this code example requires the following parameters:
build3dmodel -i intrinsics_filename.yml [-d detector] [-de
descriptor_extractor] -m model_name.yml
The first file is the camera matrix which you obtain after the calibration process (there is an especific example with it). Detector and descriptor detector must match with valid FeatureDetector and DescriptorExtractor names. Model name is a bit confusing, it looks like part of the yml file name where data will be saved.

First see some tutorial like introduction to OpenCv or OpenCV tutorial. Also, see input and output with OpenCv.

Simple and fast method to compare images for similarity

I need a simple and fast way to compare two images for similarity. I.e. I want to get a high value if they contain exactly the same thing but may have some slightly different background and may be moved / resized by a few pixel.
(More concrete, if that matters: The one picture is an icon and the other picture is a subarea of a screenshot and I want to know if that subarea is exactly the icon or not.)
I have OpenCV at hand but I am still not that used to it.
One possibility I thought about so far: Divide both pictures into 10x10 cells and for each of those 100 cells, compare the color histogram. Then I can set some made up threshold value and if the value I get is above that threshold, I assume that they are similar.
I haven't tried it yet how well that works but I guess it would be good enough. The images are already pretty much similar (in my use case), so I can use a pretty high threshold value.
I guess there are dozens of other possible solutions for this which would work more or less (as the task itself is quite simple as I only want to detect similarity if they are really very similar). What would you suggest?
There are a few very related / similar questions about obtaining a signature/fingerprint/hash from an image:
OpenCV / SURF How to generate a image hash / fingerprint / signature out of the descriptors?
Image fingerprint to compare similarity of many images
Near-Duplicate Image Detection
OpenCV: Fingerprint Image and Compare Against Database.
more, more, more, more, more, more, more
Also, I stumbled upon these implementations which have such functions to obtain a fingerprint:
pHash
imgSeek (GitHub repo) (GPL) based on the paper Fast Multiresolution Image Querying
image-match. Very similar to what I was searching for. Similar to pHash, based on An image signature for any kind of image, Goldberg et al. Uses Python and Elasticsearch.
iqdb
ImageHash. supports pHash.
Image Deduplicator (imagededup). Supports CNN, PHash, DHash, WHash, AHash.
Some discussions about perceptual image hashes: here
A bit offtopic: There exists many methods to create audio fingerprints. MusicBrainz, a web-service which provides fingerprint-based lookup for songs, has a good overview in their wiki. They are using AcoustID now. This is for finding exact (or mostly exact) matches. For finding similar matches (or if you only have some snippets or high noise), take a look at Echoprint. A related SO question is here. So it seems like this is solved for audio. All these solutions work quite good.
A somewhat more generic question about fuzzy search in general is here. E.g. there is locality-sensitive hashing and nearest neighbor search.

Can the screenshot or icon be transformed (scaled, rotated, skewed ...)? There are quite a few methods on top of my head that could possibly help you:
Simple euclidean distance as mentioned by #carlosdc (doesn't work with transformed images and you need a threshold).
(Normalized) Cross Correlation - a simple metrics which you can use for comparison of image areas. It's more robust than the simple euclidean distance but doesn't work on transformed images and you will again need a threshold.
Histogram comparison - if you use normalized histograms, this method works well and is not affected by affine transforms. The problem is determining the correct threshold. It is also very sensitive to color changes (brightness, contrast etc.). You can combine it with the previous two.
Detectors of salient points/areas - such as MSER (Maximally Stable Extremal Regions), SURF or SIFT. These are very robust algorithms and they might be too complicated for your simple task. Good thing is that you do not have to have an exact area with only one icon, these detectors are powerful enough to find the right match. A nice evaluation of these methods is in this paper: Local invariant feature detectors: a survey.
Most of these are already implemented in OpenCV - see for example the cvMatchTemplate method (uses histogram matching): http://dasl.mem.drexel.edu/~noahKuntz/openCVTut6.html. The salient point/area detectors are also available - see OpenCV Feature Detection.

I face the same issues recently, to solve this problem(simple and fast algorithm to compare two images) once and for all, I contribute an img_hash module to opencv_contrib, you can find the details from this link.
img_hash module provide six image hash algorithms, quite easy to use.
Codes example
origin lena
blur lena
resize lena
shift lena
#include <opencv2/core.hpp>
#include <opencv2/core/ocl.hpp>
#include <opencv2/highgui.hpp>
#include <opencv2/img_hash.hpp>
#include <opencv2/imgproc.hpp>
#include <iostream>
void compute(cv::Ptr<cv::img_hash::ImgHashBase> algo)
{
auto input = cv::imread("lena.png");
cv::Mat similar_img;
//detect similiar image after blur attack
cv::GaussianBlur(input, similar_img, {7,7}, 2, 2);
cv::imwrite("lena_blur.png", similar_img);
cv::Mat hash_input, hash_similar;
algo->compute(input, hash_input);
algo->compute(similar_img, hash_similar);
std::cout<<"gaussian blur attack : "<<
algo->compare(hash_input, hash_similar)<<std::endl;
//detect similar image after shift attack
similar_img.setTo(0);
input(cv::Rect(0,10, input.cols,input.rows-10)).
copyTo(similar_img(cv::Rect(0,0,input.cols,input.rows-10)));
cv::imwrite("lena_shift.png", similar_img);
algo->compute(similar_img, hash_similar);
std::cout<<"shift attack : "<<
algo->compare(hash_input, hash_similar)<<std::endl;
//detect similar image after resize
cv::resize(input, similar_img, {120, 40});
cv::imwrite("lena_resize.png", similar_img);
algo->compute(similar_img, hash_similar);
std::cout<<"resize attack : "<<
algo->compare(hash_input, hash_similar)<<std::endl;
}
int main()
{
using namespace cv::img_hash;
//disable opencl acceleration may(or may not) boost up speed of img_hash
cv::ocl::setUseOpenCL(false);
//if the value after compare <= 8, that means the images
//very similar to each other
compute(ColorMomentHash::create());
//there are other algorithms you can try out
//every algorithms have their pros and cons
compute(AverageHash::create());
compute(PHash::create());
compute(MarrHildrethHash::create());
compute(RadialVarianceHash::create());
//BlockMeanHash support mode 0 and mode 1, they associate to
//mode 1 and mode 2 of PHash library
compute(BlockMeanHash::create(0));
compute(BlockMeanHash::create(1));
}
In this case, ColorMomentHash give us best result
gaussian blur attack : 0.567521
shift attack : 0.229728
resize attack : 0.229358
Pros and cons of each algorithm
The performance of img_hash is good too
Speed comparison with PHash library(100 images from ukbench)
If you want to know the recommend thresholds for these algorithms, please check this post(http://qtandopencv.blogspot.my/2016/06/introduction-to-image-hash-module-of.html).
If you are interesting about how do I measure the performance of img_hash modules(include speed and different attacks), please check this link(http://qtandopencv.blogspot.my/2016/06/speed-up-image-hashing-of-opencvimghash.html).

Does the screenshot contain only the icon? If so, the L2 distance of the two images might suffice. If the L2 distance doesn't work, the next step is to try something simple and well established, like: Lucas-Kanade. Which I'm sure is available in OpenCV.

If you want to get an index about the similarity of the two pictures, I suggest you from the metrics the SSIM index. It is more consistent with the human eye. Here is an article about it: Structural Similarity Index
It is implemented in OpenCV too, and it can be accelerated with GPU: OpenCV SSIM with GPU

If you can be sure to have precise alignment of your template (the icon) to the testing region, then any old sum of pixel differences will work.
If the alignment is only going to be a tiny bit off, then you can low-pass both images with cv::GaussianBlur before finding the sum of pixel differences.
If the quality of the alignment is potentially poor then I would recommend either a Histogram of Oriented Gradients or one of OpenCV's convenient keypoint detection/descriptor algorithms (such as SIFT or SURF).

If for matching identical images - code for L2 distance
// Compare two images by getting the L2 error (square-root of sum of squared error).
double getSimilarity( const Mat A, const Mat B ) {
if ( A.rows > 0 && A.rows == B.rows && A.cols > 0 && A.cols == B.cols ) {
// Calculate the L2 relative error between images.
double errorL2 = norm( A, B, CV_L2 );
// Convert to a reasonable scale, since L2 error is summed across all pixels of the image.
double similarity = errorL2 / (double)( A.rows * A.cols );
return similarity;
}
else {
//Images have a different size
return 100000000.0; // Return a bad value
}
Fast. But not robust to changes in lighting/viewpoint etc.
Source

If you want to compare image for similarity,I suggest you to used OpenCV. In OpenCV, there are few feature matching and template matching. For feature matching, there are SURF, SIFT, FAST and so on detector. You can use this to detect, describe and then match the image. After that, you can use the specific index to find number of match between the two images.

Hu invariant moments is very powerful tool to compare two images

Hash functions are used in the undouble library to detect (near-)identical images (disclaimer: I am also the author). This is a simple and fast way to compare two or more images for similarity. It works using a multi-step process of pre-processing the images (grayscaling, normalizing, and scaling), computing the image hash, and the grouping of images based on a threshold value.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart