Suppose I want to include an image upscaling/downscaling algorithm in my program. Execution time is not important, the result "quality" is. How do you define "quality" in this case and how do you then choose the best algorithm from the available choices?
On a side note, if this is too general, the underlying problem for which I'm trying to find a solution is this: suppose I have a lot of images that I will need to upscale at runtime (a video, actually). I can pre-process them and upscale them somewhat with a slow and high-quality algorithm, but I don't know the final resolution (well, people have different monitors after all), so I can't resize to that immediately. Would it be beneficial if I upscaled it somewhat with my high-quality algorithm, and then let the player upscale it further to the necessary resolution at runtime (with a fast but low quality algorithm)? Or should I leave the video as-is and leave all the upscaling to be done in one pass at runtime?
The only way to really objectively judge the quality is to do some (semi-)scientific research. Recruit several participants. Show them the upscaled images in a random order, and have them rank the subjective quality (bonus points for doing it double-blind). Then you average out the scores and choose the algorithm with the highest average score (and perhaps test for statistical significance).
You'll want to make sure the images you test give a representative sampling of the actual images you're using. If you're doing it for video, it would probably be a good idea to use short video clips as the test images, instead of stills, as I would suspect that people would perceive the upscaling quality differently for those two.
If you don't care about rigorousness, you could just perform the tests with yourself as the only test subject.
As for doing an initial prescaling, my guess is that it would not be worth it. Scaling up from a larger image shouldn't be any less expensive than scaling up from the smaller original, and I would expect it to me much more expensive than scaling up by a convinient factor, such as 2x. However, don't take my word for it... test!
Related
I'm using a simple neural network (similar to AlexNet) to classify images into categories. As a preprocessing stage, input images are resized to 256x256 before being fed into the network.
Lately, I have run into the following problem: Many of the images I deal with are of very high resolution (say, 2000x2000). In this case, doing a "hard resize" results in a severe loss of information. For example, a small 100x100 face, easily recognisable in the original image, would be unrecognisable in the resized version. In such cases, I may prefer taking several crops of the 2000x2000 image and run the classification on each crop.
I'm looking for a method to automatically determine which type of pre-processing is most adequate. Ideally, it would be able to recognize, for example, that a high resolution image of a single face should be resized, whereas a high resolution image of a crowd should be cropped several times. The basic requirements, on my part:
As computationally efficient as possible. Hence, something like a "sliding window" would be probably be ruled out (it is computationally cheaper to just crop all the images).
Ability to balance between recall and precision
What I considered thus far:
"Low-level" (image processing) approach: Implement an algorithm that uses local image information (like gradients) to distinguish between high resolution and low resolution images.
"High-level" (semantic) approach: Run the images through a pre-trained network for segmentation of some sort, and use its oputput to determine the appropriate pre-procssing.
I want to try the first option first, but not exactly sure how to go about it. Is there anything I can do in the Fourier domain? Something in OpenCv I can try? Does anyone have any suggestions/thoughts? Other ideas would be very welcome too. Thanks!
I was wondering if there is any benefit to training on high resolution images rather than low resolution. I understand that it will take longer to train on larger images and that the dimensions must be a multiple of 32. My current image set is 1440x1920. Would I be better off resizing to 480x640, or is bigger better?
It's certainly not a requirement that your images be powers of two. There may be some cases where it speeds things up (e.g. GPU allocation) but it's not critical.
Smaller images will train significantly faster, and possibly even converge quicker (all other factors held constant) as you will be able to train on bigger batches (e.g. 100-1000 images in one pass, which you might not be able to do on a single machine with high res imagery).
As to whether to resize, you need to ask yourself if every pixel in that image is critical to your task. Often this is not the case - you can probably resize a photo of a bus down to say 128x128 and still recognize that it's a bus.
Using smaller images can also help your network generalise better, too, as there is less data to overfit.
A technique often used in image classification networks is to perform distortions (e.g. random cropping, scaling & brightness adjustment) on images to (a) convert odd-sized images to a constant size, (b) synthesize more data and (c) encourage the network to generalise.
This depends largely on the application. As a rule of thumb, I'd ask myself the question: can I complete the task myself on the resized images? If so, I'd downsize to the lowest resolution before it makes the task more difficult for you yourself. If not... you're going to have to be -very- patient using images 1440 * 1920. I imagine you'll almost always be better off experimenting with more varied architectures and hyper-parameter sets on smaller images compared to fewer models on full resolution images.
Whatever size you choose, you'll have to design your network for the image size you have in mind. If you're using convolutional layers, a larger image will require larger strides, filter sizes and/or layers. The number of parameters will stay the same for each convolution, though the number of features will grow (along with batch normalisation parameters if you're using it).
I'm new to computer vision. I'm working on a research project whose objective is (1) to detect vehicles from images and videos, and then later on (2) to be able to track moving vehicles.
I'm at the initial stage where I'm collecting training data, and I'm really concerned about getting images which are at an optimum resolution for detection and tracking.
Any ideas? The current dataset I've been given (from a past project) has images of about 1200x600 pixels. But I've been told this may or may not be an optimum resolution for the detection and tracking task. Apart from considering the fact that I will be extracting haar-like features from the images, I can't think of any factor to include in making a resolution decision. Any ideas of what a good resolution ought to be for training data images in this case?
First of all, feeding raw images directly to classifiers does not produce great results although sometimes useful such as face-detection. So you need to think about feature extraction.
One big issue is that a 1200x600 has 720,000 pixels. This defines 720,000 dimensions and it poses a challenge for training and classification because of dimension explosion.
So basically you need to scale down your dimensions particularly using feature extraction. What features to detect? It completely depends on the domain.
Another important aspect is the speed. Processing bigger images takes more time and this is especially important for processing real-time images which is something of 15-30 fps.
In my project (see my profile) which was real-time (15fps), I was working on 640x480 images and for some operations I had to scale down to improve performance.
Hope this helps.
I realise measuring image quality in software is going to be really difficult, and I'm not looking for a quick-fix. Googling this is largely showing up research papers and discussions that go a bit over my head, so I was wondering if anyone in the SO community had any experience with doing any rough image quality assessment?
I want to use this to scan a few thousand images and whittle it down to a few dozen images that are most likely of poor quality. I could then show these to a user and leave the rest to them.
Obviously there are many metrics that can be a part of whether an image is of high/low quality, I'd be happy with anything that could take an image as an input and give some reasonable metrics to any of the basic image quality metrics like sharpness, dynamic range, noise, etc., leaving it up to my software to determine what's acceptable and what isn't.
Some of the images are poor quality because they've been up-scaled drastically. If there isn't a way of getting metrics like I suggested above, is there any way to detect that an image has been up-scaled like this?
This is a very difficult question since "image quality" is not a very well defined concept.
It's probably best if you start experimenting with some basic metrics and see if you come up with a measure that suits your applications.
E.g., for dynamic range, the quantiles of the distributions of each channel can yield interesting information.
For up-scaling, on the other hand, the solution is fairly simple: just do a fourier transform on the image and see the amount of energy in the high-frequency vs lower-frequency bands.
I've built an imaging system with a webcam and feature matching such that as I move the camera around; I can track the camera's motion. I am doing something similar to here, except with the webcam frames as the input.
It works really well for "good" images, but when taking images in really low light lots of noise appears (camera high gain), and that messes with the feature detection and matching. Basically, it doesn't detect any good features, and when it does, it cannot match them correctly between frames.
Does anyone know a good solution for this? What other methods are used for finding and matching features?
Here are two example images with very low features:
I think phase correlation is going to be your best bet here. It is designed to tell you the phase shift (i.e., translation) between two images. It is much more resilient (but not immune) to noise than feature detection because it operates in frequency space; whereas, feature detectors operate spatially. Another benefit is, it is very fast when compared with feature detection methods. I have an implementation available in the OpenCV trunk that is sub-pixel accurate located here.
However, your images are pretty much "featureless" with the exception of the crease in the middle, so even phase correlation may have some trouble with it. Think of it like trying to detect translation in a snow storm. If all you can see is white, you can't tell that you have translated at all, thus the term whiteout. In your case, the algorithm might suffer from "greenout" :)
Can you adjust the camera settings to work better in low-light conditions. Have you fully opened the iris? Can you live with lower framerates? Setting a longer exposure time will allow the camera to gather more light, thus giving you more features at the cost of adding motion blur. Or, if low-light is your default environment you probably want something designed for this like an IR camera, but those can be expensive. Other than that, a big lens and long exposures are your friend :)
Histogram equalization may be of interest in improving the image contrast. But, sometimes it can just enhance the noise. OpenCV has a global histogram equalization function called equalizeHist. For a more localized implementation, you'll want to look at Contrast Limited Adaptive Histogram Equalization or CLAHE for short. Here is a good article on it. This page has some nice examples, and some code.