Keypoint detection and matching for large images in opencv - opencv

I am doing keypoint detection and matching in opencv to stitch two images.
When the images are small, it works well. But when dealing with larger images, the number of keypoints detected is increased, and therefore it cost a lot of time to match them. But in order to stitch the images, it seems that we don't need so many keypoints. In order to increase efficiency, is there any way to only detect a limited number of keypoints?
In the code, I use SiftFeatureDetector and SiftDiscriptorExtractor to detect keypoints and extract descriptors.
Regards.

My advise:
Re-size the image so they become much smaller and then execute feature matching.
Once you have a fast solution (Homography) apply it and than the next matching will be much faster.
You do have a way to easily control the amount of features. You can rise the threshold and as a result less features will be selected.
You can even wrap the threshold in while() loop. It rises the threshold until features amount is less than N (but grater than some M).
Look at full code example I posted here:
Calculate offset/skew/rotation of similar images in C++

Related

Counting number of bright spots in image (python)

I'm trying to develop a way to count the number of bright spots in an image. The spots should be gaussian point sources, but there is a lot of noise. There are probably on the order of 10-20 actual point sources in this image. My first though was to use a gaussian convolution with sigma = 15, which seems to do a good job.
First, is there a better way to isolate these bright spots?
Second, how can I 'detect' the bright spots, i.e. count them? I haven't had any luck with circular hough transforms from opencv.
Edit: Here is the original without gridlines, here is the convolved image without gridlines.
I am working with thermal infrared images which subject to quantity of noises.
I found that low rank based approaches such as approaches based on Singular Value Decomposition (SVD) or Weighted Nuclear Norm Metric (WNNM) give very efficient result in terms of reducing the noise while preserving the structure of the information.
Their main drawback is the fact they are quite slow to compute (several minutes per image)
Here is some litterature:
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=7067415
https://arxiv.org/abs/1705.09912
The second paper has some MatLab code available, there is quite a lot of files but the translation to python is should not that complex.
OpenCV implement as well (and it is available in python) a very efficient algorithm on the Non-Local Means algorithm:
https://docs.opencv.org/master/d5/d69/tutorial_py_non_local_means.html

collect negative samples of adaboost algorithm for face detection

Viola-Jones' AdaBoost method is very popular for face detection? We need lots of positive and negative samples o train a face detector.
The rule for collecting positive sample is simple: the image which contains faces. But the rule for collecting negative sample is not very clear: the image which does not contains faces.
But there are so many scene that do not contain faces (which may be sky, river, house animals etc.). Which should I collect it? How can know I have collected enough negative samples?
Some suggested idea for negative samples: using the positive samples and crop the face region using the left part as negative samples. Is this work?
You have asked many questions inside your thread.
Amount of samples. As a rule of thumbs: When you train a detector you need roughly few thousands positive and negative examples per stage. Typical detector has 10-20 stages. Each stage reduces the amount of negative by a factor of 2. So you will need roughly 3,000 - 10,000 positive examples and ~5,000,000 to 100,000,000 negative examples.
Which negatives to take. A rule of thumb: You need to find a face in a given environment. So you need to take that environment as negative examples. For instance, if you try to detect faces of students sitting in a classroom than take as negative examples images from the classroom (walls, windows, human body, clothes etc). Taking images of the moon or of the sky will probably not help you. If you don't know your environment than just take as much as possible different natural images (under different light conditions).
Should you take facial parts (like an eye, or a nose) as negative? You can but this is definitely not enough (to take only those negatives). The real strength of the detector will come from the negative images which represent the typical background of the faces
How to collect/generate negative samples - You don't actually need many negative images. You can take 1000 images and generate 10,000,000 negative samples from them. Here is how you do it. Suppose you take a photo of a car of 1 mega pixel resolution 1000x1000 pixels. Suppose than you want to train face detector to work on resolution of 20x20 pixels (like openCV did). So you take your 1000x1000 big image and cut it to pieces of 20x20. You can get 2,500 pieces (50x50). So this is how from a single big image you generated 2,500 negative examples. Now you can take the same big image and cut it to pieces of size 10x10 pixels. You will now have additional 10,000 negative examples. Each example is of size 10x10 pixels and you can enlarge it by factor of 2 to force all the sample to have the same size. You can repeat this process as much as you want (cutting the input image to pieces of different size). Mathematically speaking, if your image is of size NxN - You can generate O(N^4) negative examples from it by taking each possible rectangle inside it.
In step 4, I described how to take a single big image and cut it to a large amount of negative examples. I must warn you that negative examples should not have high co-variance so I don't recommend taking only one image and generating 1 million negative examples from it. As a rule of thumb - create a library of 1000 images (or download random images from Google). Verify than none of the images contains faces. Crop about 10,000 negative examples from each image and now you have got a decent 10,000,000 negative examples. Train your detector. In the next step you can cut each image to ~50,000 (partially overlapping pieces) and thus enlarge your amount of negatives to 50 millions. You will start having very good results with it.
Final enhancement step of the detector. When you already have a rather good detector, run it on many images. It will produce false detections (detect face where there is no face). Gather all those false detections and add them to your negative set. Now retrain the detector once again. The more such iterations you do the better your detector becomes
Real numbers - The best face detectors today (like Facebooks) use hundreds of millions of positive examples and billions of negatives. As positive examples they take not only frontal faces but faces in many orientations, different facial expressions (smiling, shouting, angry,...), different age groups, different genders, different races (Caucasians, blacks, Thai, Chinese,....), with or without glasses/hat/sunglasses/make-up etc. You will not be able to compete with the best, so don't get angry if your detector misses some faces.
Good luck

Image segmentation with maxflow

I have to do a foreground/background segmentation using maxflow algorithm in C++. (http://wiki.icub.org/iCub/contrib/dox/html/poeticon_2src_2objSeg_2src_2maxflow-v3_802_2maxflow_8cpp_source.html). I get an array of pixels from a png file according to their RBG but what are the next steps. How could I use this algorithm for my problem?
I recognize that source very well. That's the Boykov-Kolmogorov Graph Cuts library. What I would recommend you do first is read their paper.
Graph Cuts is an interactive image segmentation algorithm. You mark pixels in your image on what you believe belong to the object (a.k.a. foreground) and what don't belong to the object (a.k.a the background). That's what you need first. Once you do this, the Graph Cuts algorithm best guesses what the labels of the other pixels are in the image. It basically goes through each of the other pixels that are not labeled and figures out whether or not they belong to foreground and background.
The whole premise behind Graph Cuts is that image segmentation is akin to energy minimization. Image segmentation can be formulated as a cost function with a summation of two terms:
Self-Penalty: This is the cost of assigning each pixel as either foreground or background. This is also known as a data cost.
Neighbouring Penalties: This enforces that neighbouring pixels more or less should share the same classification label. This is also known as a smoothness cost.
This kind of formulation is well known as the Maximum A Posteriori Markov Random Field classification problem (MAP-MRF). The goal is to minimize that cost function so that you achieve the best image segmentation possible. This is actually an NP-Hard problem, and is actually one of the problems that is up for money from the Clay Math Institute.
Boykov and Kolmogorov theoretically proved that the MAP-MRF problem can be translated into graph theory, and solving the MAP-MRF problem is akin to taking your image and forming it into a graph with source and sink links, as well as links that connect neighbouring pixels together. To solve the MAP-MRF, you perform the maximum-flow/minimum-cut algorithm. There are many ways to do this, but Boykov / Kolmogorov find a more efficient way that is much faster than more established algorithms, such as Push-Relabel, Ford-Fulkenson, etc.
The self penalties are what are known as t links, while the neighbouring penalties are what are known as n links. You should read up the paper to figure out how these are computed, but the t links describe the classification penalty. Basically, how much it would cost to classify each pixel as belonging to the foreground or the background. These are usually based on the negative log probability distributions of the image. What you do is you create a histogram of the distribution of what was classified as foreground and a histogram of what was classified as background.
Usually, a uniform quanitization of each colour channel for both foreground and background suffices. You then turn these into PDFs but dividing by the total number of elements in each histogram, then when you calculate the t-links for each pixel, you access the colour, then see where it lies in the histogram, then take the negative log. This will tell you how much it will cost to classify that pixel to be either foreground or background.
The neighbouring pixel costs are more intuitive. People usually just take the Euclidean distance between one pixel and a neighbouring pixel and apply this distance to a Gaussian. To make things simple, a 4 pixel neighbourhood is what is usually used (North, South, East and West).
Once you figure out how to compute the cost, you follow this procedure:
Mark pixels as foreground or background.
Create a graph structure using their library
Compute the histograms of the foreground and background pixels
Calculate t-links and add to the graph
Calculate n-links and add to the graph
Invoke the maxflow routine on the graph to segment the image
Go through each pixel and figure out whether or not the pixel belongs to foreground or background.
Create a binary map that reflects this, then copy over image pixels where the binary map is true, and don't do this when it's false.
The original source of maxflow can be found here: http://pub.ist.ac.at/~vnk/software/maxflow-v3.03.src.zip
It also has a README so you can see how the library is supposed to work given some example images.
You have a lot to digest, but Graph Cuts is one of the most powerful interactive segmentation tools out there.
Good luck!

How to manage large 2D FFTs in cuda

I have succesfully written some CUDA FFT code that does a 2D convolution of an image, as well as some other calculations.
How do I go about figuring out what the largest FFT's I can run are? It seems to be that a plan for a 2D R2C convolution takes 2x the image size, and another 2x the image size for the C2R. This seems like a lot of overhead!
Also, it seems like most of the benchmarks and such are for relatively small FFTs..why is this? It seems like for large images, I am going to quickly run out of memory. How is this typically handled? Can you perform an FFT convolution on a tile of an image and combine those results, and expect it to be the same as if I had run a 2D FFT on the entire image?
Thanks for answering these questions
CUFFT plans a different algorithm depending on your image size. If you can't fit in shared memory and are not a power of 2 then CUFFT plans an out-of-place transform while smaller images with the right size will be more amenable to the software.
If you're set on FFTing the whole image and need to see what your GPU can handle my best answer would be to guess and check with different image sizes as the CUFFT planning is complicated.
See the documentation : http://developer.download.nvidia.com/compute/cuda/1_1/CUFFT_Library_1.1.pdf
I agree with Mark and say that tiling the image is the way to go for convolution. Since convolution amounts to just computing many independent integrals you can simply decompose the domain into its constituent parts, compute those independently, and stitch them back together. The FFT convolution trick simply reduces the complexity of the integrals you need to compute.
I expect that your GPU code should outperform matlab by a large factor in all situations unless you do something weird.
It's not usually practical to run FFT on an entire image. Not only does it take a lot of memory, but the image must be a power of 2 in width and height which places an unreasonable constraint on your input.
Cutting the image into tiles is perfectly reasonable. The size of the tiles will determine the frequency resolution you're able to achieve. You may want to overlap the tiles as well.

Fast and quick pixel matching algorithm

I am stuck in a pixel matching algorithm for finding symbols in an image. I have two images of symbols that I intend to find in an image that has big resolution.
Instead of a pixel by pixel matching algorithm, is there a fast algorithm that gives the same result as that of pixel matching algorithm. The result should be similar to: (percentage of pixel matched) divide by (total pixels).
My problem is that I wish to find certain symbols in a 1 bit image. The symbol appear with exact similarity in the target image and 95% of total pixel match with the target block in the image. but it takes hours to do iterations. The image is 10k X 10k and the symbol size is 20 X 20, so it will 10 power of 10 calculations which is too much to handle. Is there any filter/NN combination or any other algorithm that can give same results as that of pixel matching in a few minutes?
The point here is that pixels are almost same in the but problem is that size is very large. I do not want complex features for noise handling or edges, fuzzy etc. just a simple algorithm to do pixel matching quickly and the result should be similar to: (percentage of pixel matched) divide by (total pixels)
object recognition is tricky in that any simple algorithm is generally going to be way too slow, as you've apparently realized.
Luckily, if you have a rather large collection of these images on hand that are already correctly labeled, then I have a very simply solution for you.
Simply make 3 layer feedforward network with one input unit per pixel, all of which connect to a much smaller hidden layer, and then those in turn connect to 1 output unit (representing which symbol is present in the image). Then just run the backpropagation algorithm on your dataset until the network learns to identify the symbols.
Unfortunately, this doesn't scale very well, so you might have to look into convolutional NNs for better performance.
Additionally, if you don't have any training data (i.e. labeled examples), then your best bet is probably to decompose your symbols into features and then sweep the image for those. If you can decompose them into lines, then a hough transform can do this quite rapidly.
Maybe an (Adaptive Resonance Theory) ART-1 network could help.
The algorithm can also be written that all Prototypes are checked in parallel in the same time and it can be blazing fast because it esentially uses binary math a lot.

Resources