I have some images of tags with shapes on them (circle, rectangle and blank). After processing the images with median blur and Gabor filters I can eliminate most of the effect that variable illumination had on the images and they look like this:
I've tried training an SVM using HOG, LDA, PCA and the pixels themselves but I can barely get past 40-60% accuracy. What I really want to do is use in the information in the shapes of the images. I had Fourier descriptors recommended to me, and while I've found a good tutorial about applying Fourier transform to images using NumPy and OpenCV, I'm not sure how to go about extracting Fourier descriptors from an image and then identifying the ones that are unique to the different shapes. Does anyone know how to do this or can recommend an alternative technique to get features from these images that would allow an SVM to distinguish between them?
Related
I've been processing some image frames in videos and I discovered that sometimes one or two frames of the video will have artifacts or noise like the images below:
The artifacts look like abrasions of paint with noisy colors that covers only a small region (less than 100x100 in a 1000x2000 frame) of the image. I wonder if there are ways to detect the noisy frames? I've tried to use the difference of frames with SSIM, NMSE or PSNR but found limited effectiveness. Saliency map (left) or sobel/scharr filtering (right) providing more obvious view but regular borders are also included and I'm not sure how to form a classifier.
Scharr saliency map:
Since they are only a few frames in videos it's not quite necessary to denoising and I can just remove the frames one detected. The main problem here is that it's difficult to distinguish those frames in playing videos.
Can anybody offer some help here?
Detailing the comment as an answer with a few more details:
The Scharr and saliency map looks good.
Thresholding will result in a binary image which can be cleaned up with morphological filters (erode to enhance artefacts, dilate to 'erase' gradient contours).
Finding contours will result in lists of points which can be further processed/filtered using contour features.
If the gradients are always bigger than the artefacts, contour features, such as the bounding box dimensions and aspect ratio should help segment artefact contours from gradient contours (if any: hopefully dilation would've cleaned up the thresholded/binary image).
Another idea could be looking into oriented gradients:
either computer the oriented gradients (see visualisations): with the right cell size you might strike a balance where the artefacts have a high magnitude while gradient edges don't
you could try a full histogram of oriented gradients (HoG) classifier setup (using an SVM trained on histograms (as features))
The above options do rely on hand crafted features/making assumptions about the size of artefacts.
ML could be an interesting route too, hopefully it can generalise well enough.
Depending how many example images you have available, you could test a basic prototype using Teachable Machine (which behind the scenes would apply KNN to a transfer learning layer on top of MobileNet (or similar net)) fairly fast.
(Note: I've posted OpenCV Python links, but there are libraries that can help (e.g. scikit-image, scikit-learn, kornia, etc. in Python, cvv in c++, BoofCV in java, etc. (and there might be toolboxes for Matlab/Octave with similar features))
Apologies if this is a naive or foolish question, but I am trying to learn a bit more about image processing techniques. I had an intuition about Gabor filters but can't seem to find an answer.
If I calculate a bank of Gabor filters for a set of images and reduce them to N features that a machine learning algorithm has determined to be indicative of a specific texture, can these N features be applied to a novel image to "transfer" the texture to the novel image? Perhaps via an inverse Gabor transform? For example, if I have 10 Gabor filters that can accurate classify a texture as "brick", can these 10 filters be applied to a "wood" texture image (picture of a 2x4) to approximate the brick texture on the wood surface?
If possible this is possible, can it be easily implemented in Python?
As far as I understand, this is directly impossible. "When working with Gabor filters, it is common to work with the magnitude response of each filter." https://www.mathworks.com/help/images/texture-segmentation-using-gabor-filters.html
That is, information is only about the magnitude of the signal, but there is no information about the phase.
I need to make an application in iphone which needs to calculate noise, geometric deformation other distortions in an image. How to do this? I have done some image processing stuff with opencv + iphone. But I dont know how to calculate these parameters.
1) How to calculate noise in an image?
2) What is geometric deformation and how to calculate geometric deformation of an image?
3) Is geometric deformation and distortion are same parameters in terms of image filter? or any other distortions available to calculate an image is good quality or not?
Input: My image is a face image in live video stream.
I advise you to read some literature about image processing, for example Gonzalez & Woods.
1) The simplest method of noise calculation by single image is to compute standard deviation between image and its smoothed copy. For smoothing I recommend you to use simple median filter by sample of 3x3 pixels (or more). Median is non-sensitive to outbursts of data, so noice like "salt-n-pepper" won't worsen statistics.
In cases of overexposed or underexposed images such method can give you bad results, in that case you can calculate FFT of image and use a high frequency components for noise estimation.
2), 3) Calculation of geometric deformation is possible only if you know, what should be on image. For example, if you use mire (optical etalon) with quadratic grid, you can find lines on your image (for example by Canny edge detector) and compute distortion, astigmatism and some other aberrations. This could be done also if you sure that image have some straight lines.
Defocusing can be computed from analysis of edges on image or with help of image wavelet transform.
There also much more different methods for image analysing. For example, by analysis of colour image you can estimate chromatic aberration and so on.
But I repeat: in common case this operations are impossible. They all have some particular cases of application.
Read about image quality: there are no standard for this term, in every particular case you can use one or more simple characteristics to recognize whether image good or not.
In you case I'd advice you to make a lot of photos with different kind of artefacts and quality, then make simple analysis of their statistics, wavelet compositions and R-G-B components correlation. BTW, to make analysis of colour image less sensitive to its brightness I recommend you to work in HSV colorspace (but to estimate chromatic aberration you need to work exactly with RGB components).
I am looking for parabolas in some radar data. I am using the OpenCV Haar cascaded classifier. My positive images are 20x20 PNGs where all of the pixels are black, except for those that trace a parabolic shape--one parabola per positive image.
My question is this: will these positives train a classifier to look for black boxes with parabolas in them, or will they train a classifier to look for parabolic shapes?
Should I add a layer of medium value noise to my positive images, or should they be unrealistically crisp and high contrast?
Here is an example of the original data.
Here is an example of my data after I have performed simple edge detection using GIMP. The parabolic shapes are highlighted in the white boxes
Here is one of my positive images.
I figured out a way to do detect parabolas initially using the MatchTemplate method from OpenCV. At first, I was using the Python cv, and later cv2 libraries, but I had to make sure that my input images were 8-bit unsigned integer arrays. I eventually obtained a similar effect with less fuss using scipy.signal.correlate2d( image, template, mode='same'). The mode='same' resizes the output to the size of image. When I was done I performed thresholding, using the numpy.where() function, and opening and closing to eliminate salt and pepper noise using the scipy.ndimage module.
Here's the output, before thresholding.
As we know Fourier Transform is sensitive to noises(like salt and peppers),
how can it still be used for image recognization?
Is there a FT expert here?
Update to actually answer the question you asked... :) Pre-process the image with a non-linear filter to suppress the salt & pepper noise. Median filter maybe?
Basic lesson on FFTs on matched filters follows...
The classic way of detecting a smaller image within a larger image is the matched filter. Essentially, this involves doing a cross correlation of the larger image with the smaller image (the thing you're trying to recognize).
For every position in the larger image
Overlay the smaller image on the larger image
Multiply all corresponding pixels
Sum the results
Put that sum in this position in the filtered image
The matched filter is optimal where the only noise in the larger image is white noise.
This IS computationally slow, but it can be decomposed into FFT (fast Fourier transform) operations, which are much more efficient. There are much more sophisticated approaches to image matching that tolerate other types of noise much better than the matched filter does. But few are as efficient as the matched filter implemented using FFTs.
Google "matched filter", "cross correlation" and "convolution filter" for more.
For example, here's one brief explanation that also points out the drawbacks of this very oldschool image matching approach: http://www.dspguide.com/ch24/6.htm
Not sure exactly what you're asking. If you are asking about how FFT can be used for image recognition, here are some thoughts.
FFT can be used to perform image "classification". It can't be used to recognize different faces or objects, but it can be used to classify the type of image. FFT calculates the spacial frequency content of the image. So for example, natural scene, face, city scene, etc. will have different FFTs. Therefore you can classify image or even within image (e.g. aerial photo to classify terrain).
Also, FFT is used in pre-processing for image recognition. It can be used for OCR (optical character recognition) to rotate the scanned image into correct orientation. FFT of typed text has a strong orientation. Same thing for parts inspection in industrial automation.
I don't think you'll find many methods in use that rely on Fourier Transforms for image recognition.
In the case of salt and pepper noise, it can be considered high frequency noise, and thus you could low pass filter your FFT before making a comparison with the target image.
I would imagine that it would work, but that different images that are somewhat similar (like both are photographs taken outside) would register as being the same image.