Detecting a pattern of dark/bright bands in an image - image-processing

I'm trying to detect a pattern like this in some images
The actual image looks something like this
It could be scaled and/or rotated. Is there a way to do that efficiently without resorting to neural nets or some learning algorithm? Can some detection be done based on the value gradient for example (dark-bright-dark-bright-dark)?

input image is MxN (in your example M<N ):
take mean RGB image
mean Y to get 1xN vector
derive
abs
threshold
calculate the distance between peaks.
search for a location where the ratio between the distances is as expected (from what i see in your example ~ 1:7:1)
if a place found, validate the colors in the middle of the distance (from your example should be white-black-white)

You might be able to use Gabor Filters at varying orientations, and do standard threshold to identify objects.
If you know the frequency of the pattern you could try using a bandpass filter to isolate objects at that frequency. If it is a very strong frequency, you might be able to identify it in the image's Fourier transform.
Without much other knowledge about what you are looking for in your image, it will be very difficult to identify a specific repeating pattern.

Related

Which feature descriptors to use and why?

I do like to do compute the position and orientation of a camera in a civil aircraft cockpit.
I do use LEDs as fixed points. My plan is to save their X,Y,Z Position associated with the LED.
How can I detect and identify my LEDs on my images? Which feature descriptor and feature point extractor should I use?
How should I modify my image prior to feature detection?
I like to stay efficient.
----Please stop voting this question down----
Now after having found the solution to my problem, I do realize the question might have been too generic.
Anyways to support other people googeling I am going to describe my answer.
With combinations of OpenCVs functions I create masks which contain areas where the LEDs could be in white. The rest of the image is black. These functions are for example Core.range, Imgproc.dilate, and Imgproc.erode. Also with Imgproc.findcontours I am filtering out too large or too small contours. Also used to combine masks is Core.bitwise_and, or Core.bitwise_not.
The masks are computed from an image in the HSV color space as input.
Having these masks with potential LED areas, I do compute color histograms, which of the intensity normalized rgb colors. (Hue did not work well enough for me). These histograms are trained and normalized using a set of annotated input images and represent my descriptor.
I do match the trained descriptor against computed onces in the application using histogram intersection.
So I receive distance measures. Using a threshold for these measures, the measures and the knowledge of the geometric positions of the real-life LEDs I translate the patches to a graph system, which helps me to find the longest chain of potential LEDs.

Find input image (ID,passport) in imagesDB based on similarity

I would like to decide if an image is present in a list stored in a DB (e.g. pictures of IDs, passport, Stu. card, etc). I thought about using a KNN algorithm, that will plot the K closest images.
Options for distance metric:
sum of Euclidean distance between each relative pixels (img1[pixel_i], img2[pixel_i])
sum of Euclidean distance betwen each pixel to each other, multiplied by some factor decreasing with distance (pixel to pixel)
same as above, but with manhattan...
Do you know/think of a better way to deal with the image similarity subject?
I think that using raw graylevel values in computing distances is a very bad idea. This is not invariant to illumination, to translation and to rotation (although I don't think that rotation is a big issue in face images).
Try to use some robust and invariant descriptor extracted from each image (e.g. SIFT on keypoints) and then compute distances between those features. K-NN could work. Alternatively, look for image retrieval literature for more advanced approaches.
Hope this helps!
If you have a large number of images in your database, it will get rather unwieldy calculating the similarity between a given image and every single image in your database every time. Instead, I would consider something like a Perceptual Hash (pHash) where you could pre-compute a parameter ONCE for each image in your database and store it, and then , when you want to compare an image you calculate just its single pHash and compare that with all the stored ones in your database.

Vehicle segmentation and tracking

I've been working on a project for some time, to detect and track (moving) vehicles in video captured from UAV's, currently I am using an SVM trained on bag-of-feature representations of local features extracted from vehicle and background images. I am then using a sliding window detection approach to try and localise vehicles in the images, which I would then like to track. The problem is that this approach is far to slow and my detector isn't as reliable as I would like so I'm getting quite a few false positives.
So I have been considering attempting to segment the cars from the background to find the approximate position so to reduce the search space before applying my classifier, but I am not sure how to go about this, and was hoping someone could help?
Additionally, I have been reading about motion segmentation with layers, using optical flow to segment the frame by flow model, does anyone have any experience with this method, if so could you offer some input to as whether you think this method would be applicable for my problem.
Below is two frames from a sample video
frame 0:
frame 5:
Assumimg your cars are moving, you could try to estimate the ground plane (road).
You may get a descent ground plane estimate by extracting features (SURF rather than SIFT, for speed), matching them over frame pairs, and solving for a homography using RANSAC, since plane in 3d moves according to a homography between two camera frames.
Once you have your ground plane you can identify the cars by looking at clusters of pixels that don't move according to the estimated homography.
A more sophisticated approach would be to do Structure from Motion on the terrain. This only presupposes that it is rigid, and not that it it planar.
Update
I was wondering if you could expand on how you would go about looking for clusters of pixels that don't move according to the estimated homography?
Sure. Say I and K are two video frames and H is the homography mapping features in I to features in K. First you warp I onto K according to H, i.e. you compute the warped image Iw as Iw( [x y]' )=I( inv(H)[x y]' ) (roughly Matlab notation). Then you look at the squared or absolute difference image Diff=(Iw-K)*(Iw-K). Image content that moves according to the homography H should give small differences (assuming constant illumination and exposure between the images). Image content that violates H such as moving cars should stand out.
For clustering high-error pixel groups in Diff I would start with simple thresholding ("every pixel difference in Diff larger than X is relevant", maybe using an adaptive threshold). The thresholded image can be cleaned up with morphological operations (dilation, erosion) and clustered with connected components. This may be too simplistic, but its easy to implement for a first try, and it should be fast. For something more fancy look at Clustering in Wikipedia. A 2D Gaussian Mixture Model may be interesting; when you initialize it with the detection result from the previous frame it should be pretty fast.
I did a little experiment with the two frames you provided, and I have to say I am somewhat surprised myself how well it works. :-) Left image: Difference (color coded) between the two frames you posted. Right image: Difference between the frames after matching them with a homography. The remaining differences clearly are the moving cars, and they are sufficiently strong for simple thresholding.
Thinking of the approach you currently use, it may be intersting combining it with my proposal:
You could try to learn and classify the cars in the difference image D instead of the original image. This would amount to learning what a car motion pattern looks like rather than what a car looks like, which could be more reliable.
You could get rid of the expensive window search and run the classifier only on regions of D with sufficiently high value.
Some additional remarks:
In theory, the cars should even stand out if they are not moving since they are not flat, but given your distance to the scene and camera resolution this effect may be too subtle.
You can replace the feature extraction / matching part of my proposal with Optical Flow, if you like. This amounts to identifying flow vectors that "stick out" from a consistent frame-to-frame motion of the ground. It may be prone to outliers in the optical flow, however. You can also try to get the homography from the flow vectors.
This is important: Regardless of which method you use, once you have found cars in one frame you should use this information to robustify your search of these cars in consecutive frame, giving a higher likelyhood to detections close to the old ones (Kalman filter, etc). That's what tracking is all about!
If the number of cars in your field of view always remain the same but move around then you can use optical flow...it will give you good results against a still background...if the number of cars are changing then you need to call goodFeaturestoTrack function in OpenCV after certain number of frames and again track the cars using optical flow.
You can use background modelling to model the background and hence the cars are always your foreground.The simplest example is frame differentiation...subtract the previous frame current frame. diff(x,y,k) = I(x,y,k) - I(x,y,k-1) .As your cars are moving in each frame you will get their position..
Both the process will work fine since you have a still background I presume..check this link to find what Optical flow can do.

How to match texture similarity in images?

What are the ways in which to quantify the texture of a portion of an image? I'm trying to detect areas that are similar in texture in an image, sort of a measure of "how closely similar are they?"
So the question is what information about the image (edge, pixel value, gradient etc.) can be taken as containing its texture information.
Please note that this is not based on template matching.
Wikipedia didn't give much details on actually implementing any of the texture analyses.
Do you want to find two distinct areas in the image that looks the same (same texture) or match a texture in one image to another?
The second is harder due to different radiometry.
Here is a basic scheme of how to measure similarity of areas.
You write a function which as input gets an area in the image and calculates scalar value. Like average brightness. This scalar is called a feature
You write more such functions to obtain about 8 - 30 features. which form together a vector which encodes information about the area in the image
Calculate such vector to both areas that you want to compare
Define similarity function which takes two vectors and output how much they are alike.
You need to focus on steps 2 and 4.
Step 2.: Use the following features: std() of brightness, some kind of corner detector, entropy filter, histogram of edges orientation, histogram of FFT frequencies (x and y directions). Use color information if available.
Step 4. You can use cosine simmilarity, min-max or weighted cosine.
After you implement about 4-6 such features and a similarity function start to run tests. Look at the results and try to understand why or where it doesnt work. Then add a specific feature to cover that topic.
For example if you see that texture with big blobs is regarded as simmilar to texture with tiny blobs then add morphological filter calculated densitiy of objects with size > 20sq pixels.
Iterate the process of identifying problem-design specific feature about 5 times and you will start to get very good results.
I'd suggest to use wavelet analysis. Wavelets are localized in both time and frequency and give a better signal representation using multiresolution analysis than FT does.
Thre is a paper explaining a wavelete approach for texture description. There is also a comparison method.
You might need to slightly modify an algorithm to process images of arbitrary shape.
An interesting approach for this, is to use the Local Binary Patterns.
Here is an basic example and some explanations : http://hanzratech.in/2015/05/30/local-binary-patterns.html
See that method as one of the many different ways to get features from your pictures. It corresponds to the 2nd step of DanielHsH's method.

Why can fourier transform be used for image recognization while being sensitive to noises?

As we know Fourier Transform is sensitive to noises(like salt and peppers),
how can it still be used for image recognization?
Is there a FT expert here?
Update to actually answer the question you asked... :) Pre-process the image with a non-linear filter to suppress the salt & pepper noise. Median filter maybe?
Basic lesson on FFTs on matched filters follows...
The classic way of detecting a smaller image within a larger image is the matched filter. Essentially, this involves doing a cross correlation of the larger image with the smaller image (the thing you're trying to recognize).
For every position in the larger image
Overlay the smaller image on the larger image
Multiply all corresponding pixels
Sum the results
Put that sum in this position in the filtered image
The matched filter is optimal where the only noise in the larger image is white noise.
This IS computationally slow, but it can be decomposed into FFT (fast Fourier transform) operations, which are much more efficient. There are much more sophisticated approaches to image matching that tolerate other types of noise much better than the matched filter does. But few are as efficient as the matched filter implemented using FFTs.
Google "matched filter", "cross correlation" and "convolution filter" for more.
For example, here's one brief explanation that also points out the drawbacks of this very oldschool image matching approach: http://www.dspguide.com/ch24/6.htm
Not sure exactly what you're asking. If you are asking about how FFT can be used for image recognition, here are some thoughts.
FFT can be used to perform image "classification". It can't be used to recognize different faces or objects, but it can be used to classify the type of image. FFT calculates the spacial frequency content of the image. So for example, natural scene, face, city scene, etc. will have different FFTs. Therefore you can classify image or even within image (e.g. aerial photo to classify terrain).
Also, FFT is used in pre-processing for image recognition. It can be used for OCR (optical character recognition) to rotate the scanned image into correct orientation. FFT of typed text has a strong orientation. Same thing for parts inspection in industrial automation.
I don't think you'll find many methods in use that rely on Fourier Transforms for image recognition.
In the case of salt and pepper noise, it can be considered high frequency noise, and thus you could low pass filter your FFT before making a comparison with the target image.
I would imagine that it would work, but that different images that are somewhat similar (like both are photographs taken outside) would register as being the same image.

Resources