I am working on a project which identifies objects after capturing their images on Android platform. For this, I extracted features of sample images such as compactness, rectangularity, elongation, eccentricity, roundness, sphericity, lobation, and hu moments. After then, random tree is used as classifier. As I used pictures gathered from Google which are not in high resolution for creating my classifier, captured images of size 1280x720 gives 19/20 correct results when the image is cropped.
However, when I capture images of large sizes such as about 5 megapixels, and crop them for identification, the number of obtained correct results dramaticaly decreases.
Do I need to extract features of images with high resolution and train them in order to get accurate results when high resolution pictures are captured? Is there a way of adjusting extracted features related to the image resolution?
Some feature descriptors are sensitive to scaling. Others, like SIFT and SURF, are not. If you expect the resolution (or scale) of your images to change, it's best to use scale-invariant feature descriptors.
If you use feature descriptors that are not scale-invariant, you can still get decent results by normalizing the resolution of your images. Try scaling the 5 megapixel images to 1280x720 -- do the classification results improve?
Related
I'm trying to use OpenCV to detect and extract ORB features from images.
However, the images I'm getting are not normalized (different size, different resolutions, etc...).
I was wondering if I need to normalize my images before extracting ORB features to be able to match them across images?
I know the feature detection is scale invariant, but I'm not sure about what it means for images resolution (for example, 2 images of the same size, with 1 object close, and far in the other should result in a match, even if they have a different scale on the images, but what if the images don't have the same size?).
Should I adapt the patchSize from ORB based on the image size to have (for example if I have an image of 800px and take a patchSize of 20px, should I take a patchSize of 10px for an image of 400px?).
Thank you.
Update:
I tested different algorithms (ORB, SURF and SIFT) with high and low resolution images to see how they behave. In this image, objects are the same size, but image resolution is different:
We can see that SIFT is pretty stable, although it has few features. SURF is also pretty stable in terms of keypoints and feature scale. So My guess is that feature matching between a low res and high res images with SIFT and SURF would work, but ORB has much larger feature in low res, so descriptors won't match those in the high res image.
(Same parameters have been used between high and low res feature extraction).
So my guess is that it would be better to SIFT or SURF if we want to do matching between images with different resolutions.
According to OpenCV documentation, ORB also use pyramid to produce multiscale-features. Although details are not clear on this page.
If we look at the ORB paper itself, at section 6.1 it is mentioned that images with five different scales are used. But still we are not sure whether you need to compute descriptors on images with different scale manually, or it is already implemented in OpenCV ORB.
Finally, from source code(line 1063 while I write this answer) we see that images with different resolution is computed for keypoint/descriptor extraction. If you track variables you see that there is a scale factor for ORB class which you can access with getScaleFactor method.
In short, ORB tries to perform matching at different scales itself.
I'm working on a data set which has a good number of blurred, faded, dark, low resolution and noisy face images. I need to eliminate those images in the pre-processing stage, but I can't remove those images manually by subjective speculation.
Which libraries/APIs are used in the open source domain to evaluate the "quality" of the digital face images?
The quality metric most used is mAP (median accuracy percent), where accuracy can be True Positives/Sample size or the jaccard index. So, you will need the ground truth for the dataset.
I'm using a simple neural network (similar to AlexNet) to classify images into categories. As a preprocessing stage, input images are resized to 256x256 before being fed into the network.
Lately, I have run into the following problem: Many of the images I deal with are of very high resolution (say, 2000x2000). In this case, doing a "hard resize" results in a severe loss of information. For example, a small 100x100 face, easily recognisable in the original image, would be unrecognisable in the resized version. In such cases, I may prefer taking several crops of the 2000x2000 image and run the classification on each crop.
I'm looking for a method to automatically determine which type of pre-processing is most adequate. Ideally, it would be able to recognize, for example, that a high resolution image of a single face should be resized, whereas a high resolution image of a crowd should be cropped several times. The basic requirements, on my part:
As computationally efficient as possible. Hence, something like a "sliding window" would be probably be ruled out (it is computationally cheaper to just crop all the images).
Ability to balance between recall and precision
What I considered thus far:
"Low-level" (image processing) approach: Implement an algorithm that uses local image information (like gradients) to distinguish between high resolution and low resolution images.
"High-level" (semantic) approach: Run the images through a pre-trained network for segmentation of some sort, and use its oputput to determine the appropriate pre-procssing.
I want to try the first option first, but not exactly sure how to go about it. Is there anything I can do in the Fourier domain? Something in OpenCv I can try? Does anyone have any suggestions/thoughts? Other ideas would be very welcome too. Thanks!
I'm new to computer vision. I'm working on a research project whose objective is (1) to detect vehicles from images and videos, and then later on (2) to be able to track moving vehicles.
I'm at the initial stage where I'm collecting training data, and I'm really concerned about getting images which are at an optimum resolution for detection and tracking.
Any ideas? The current dataset I've been given (from a past project) has images of about 1200x600 pixels. But I've been told this may or may not be an optimum resolution for the detection and tracking task. Apart from considering the fact that I will be extracting haar-like features from the images, I can't think of any factor to include in making a resolution decision. Any ideas of what a good resolution ought to be for training data images in this case?
First of all, feeding raw images directly to classifiers does not produce great results although sometimes useful such as face-detection. So you need to think about feature extraction.
One big issue is that a 1200x600 has 720,000 pixels. This defines 720,000 dimensions and it poses a challenge for training and classification because of dimension explosion.
So basically you need to scale down your dimensions particularly using feature extraction. What features to detect? It completely depends on the domain.
Another important aspect is the speed. Processing bigger images takes more time and this is especially important for processing real-time images which is something of 15-30 fps.
In my project (see my profile) which was real-time (15fps), I was working on 640x480 images and for some operations I had to scale down to improve performance.
Hope this helps.
I am doing some image enhancement experiments so I take photos from my cheap camera. The camera has mosaic artifacts and all images look like grid. I think pillbox (out-of-focus) kernel and Gaussian kernel would not be the best candidates. Any suggestions?
EDIT:
Sample
I suspect this cannot be done via a constant kernel, because the effects on pixels are not the same (so there are "grids").
The effects are non linear. (And probably non-stationary), so you cannot simply invert the convolution and enhance the image -- if you could, the camera chip would do it on-board.
The best way to work out what the convolution is (or at least an approximation to it) might be to take photos of known patterns, compute, and working in 2D frequency/laplace domain divide the resulting spectra to get a linear approximation to the filter.
I suspect that the convolution you discover by doing this will be very context dependant -- so the best way to enhance an image might be to divide it into tiles, classify each region of the image as belonging to a different set (for each of which you could work out a different linear approximation to the convolution, based on test data), and then deconvolve each separately.