Fit CVPixelBuffer into a square image with resizing and aspect ratio saving - ios

I have an image that should be preprocessed before passing to CoreML to 640x640 square with resizing and saving aspect ratio (check the image). I found a lot of helpful links about resizing using vImageScale_*, but haven't found anything similar to adding coloured paddings to the resized image.
I know that Vision has the scaleFit option, but the final output is a bit different, so I'm trying to make image centered.

The vImageScale_* functions scale to fit the destination, so the aspect ratio will change if the source and destination are different ratios.
vImage provides affine transform operations (with support for a background colour!). Take a look at https://developer.apple.com/documentation/accelerate/applying_geometric_transforms_to_images for more information. The final example, Apply a Complex Affine Transform to a vImage Buffer, does exactly what you need (you just need to remove the rotate step).

The colored padding is just going to be something like vImageBufferFill_ARGB8888, and the middle bits will be vImageScale_ARGB8888. Since the scale operation works with integer heights, widths and origin, you won't need to worry about antialiasing the content at the seams between regions.
Note that this is also relatively easy to do with CGBitmapContextRef and CGContextDrawImage(CGContextRef, CGRect, CGImageRef), though CG just uses a Lanczos2 resampler. You will have less details to micromanage with image orientation and colorspace conversion, and it allows for fractional pixel placement.
You can also get fractional pixel placement with vImage, but will need to use AffineWarp or the shear functions. For ML, I'd probably not attempt to do too much image manipulation in case the ringing has some odd effect. A simple low pass filter + bilinear interpolation might be something to try. This will be a bit blurrier to the human eye but might avoid feeding artifacts to the training workload. Presumably there is ample literature about ML image preprocessing to guide you.

Related

Does scale up or down images effect image information?

i'm work on graduation project for image forgery detection using CNN , Most of the paper i read before feed the data set to the network they Down scale the image size, i want to know how Does this process effect image information ?
Images are resized/rescaled to a specific size for a few reasons:
(1) It allows the user to set the input size to their network. When designing a CNN you need to know the shape (dimensions) of your data at each step; so, having a static input size is an easy way to make sure your network gets data of the shape it was designed to take.
(2) Using a full resolution image as the input to the network is very inefficient (super slow to compute).
(3) For most cases the features desired to be extracted/learned from an image are also present when downsampling the image. So in a way resizing an image to a smaller size will denoise the image, filtering out much of the unimportant features within the image for you.
Well you change the images size. Of course it changes it's information.
You cannot reduce image size without omitting information. Simple case: Throw away every second pixel to scale image to 50%.
Scaling up adds new pixels. In its simplest form you duplicate pixels, creating redundant information.
More complex solutions create new pixels (less or more) by averaging neighbouring pixels or interpolating between them.
Scaling up is reversible. It doesn't create nor destroy information.
Scaling down divides the amount of information by the square of the downscaling factor*. Upscaling after downscaling results in a blurred image.
(*This is true in a first approximation. If the image doesn't have high frequencies, they are not lost, hence no loss of information.)

UIImage part of the image in focus

I'm trying to extend my understanding of the AVFoundation framework.
I want to add a Bezier Path (not necessarily a high resolution one) around the area of an image that is in focus.
So, given an UIImage, is it possible to know which points of the UIImage are in focus and which points aren't?
(not sure if any of the GPUImage "detection filters" would be useful to achieve what I'm trying).
One way would be to look for areas of high frequencies vs. low frequencies. The low frequency areas are more likely to be out-of-focus.
You could do this with a fast fourier transform. But a cheap hack might be to blur your input image and then compare the blurred version to the original. The lower the absolute difference, the lower frequency the input image is at that point. However, this has the downside of detecting areas of flat color as "out-of-focus". Though, I guess it's hard for a human to distinguish those, as well, unless there's other context in the image.

Why supersampling is not widely used for image scaling?

I look for an appropriate image scaling algorithm and wondered why supersampling is not as popular as bicubic, bilinear or even lanczos.
By supersampling I mean a method that divides the source image into equal rectangles, each rectangle corresponding to a pixel in the destination image. In my opinion, this is the most natural and accurate method. It takes into account all pixels of the source image, while bilinear might skip some pixels. As far as I can see, the quality is also very high, comparable with lanczos.
Why do popular image libraries (such as GraphicsMagic, GD or PIL) not implement this algorithm? I found realizations only in Intel IPP and AMD Framewave projects. I know at least one disadvantage: it can only be used for downscaling, but am I missing something else?
For comparison, this is a 4.26x scaled down image. From left to right: GraphicsMagic Sinc filter (910ms), Framewave Super method (350ms), GraphicsMagic Triangle filter (320ms):
Now I know the answer. Because pixel is not a little square. And that is why supersampling resizing gives aliased result. This can be seen on thin water jets on sample image. This is not fatal and supersampling can be used for scaling to 2x, 3x and so on to dramatically reduce picture size before resize to exact dimensions with another method. This technique is used in jpeglib to open images in smaller size.
Of course we still can think about pixels as squares and actually, GD library does. It's imagecopyresampled is true supersampling.
You are a bit mistaken (when saying that linear rescaling misses pixels). Assuming You are rescaling the image by at most factor of 2, Bilinear interpolation takes into account all the pixels of the source image. If you smooth the image a bit and use bilinear interpolation this gives you high quality results. For most practical cases even bi-qubic interpolation is not needed.
Since bi-linear interpolation is extremely fast (can be easily executed in fixed point calculations) it is by far the best image rescaling algorithm when dealing with real time processing.
If you intend to shrink the image by more than factor of 2 than bilinear interpolation is mathematically wrong and with larger factors even bi-cubic starts to make mistakes. That is why in image processing software (like photoshop) we use better algorithms (yet much more CPU demanding).
The answer to your question is speed consideration.
Given the speed of your CPU/GPU, the image size and desired frame rate you can easily compute how many operations you can do for every pixel. For example - with 2GHZ CPU and 1[Gpix] image size, you can only make few calculations for each pixel every second.
Given the amount of allowed calculations - you select the best algorithms. So the decision is usually not driven by image quality but rather by speed considerations.
Another issue about super sampling - Sometimes if you do it in frequency domain, it works much better. This is called frequency interpolation. But you will not want to calculate FFT just for rescaling an image.
Moreover - I don't know if you are familiar with back projection. This is a way to interpolate the image from destination to source instead of from source to destination. Using back projection you can enlarge the image by a factor of 10, use bilinear interpolation and still be mathematically correct.
Computational burden and increased memory demand is most likely the answer you are looking for. That's why adaptive supersampling was introduced which compromises between burden/memory demand and effectiveness.
I guess supersampling is still too heavy even for today's hardware.
Short answer: They are super-sampling. I think the problem is terminology.
In your example, you are scaling down. This means decimating, not interpolating. Decimation will produce aliasing if no super-sampling is used. I don't see aliasing in the images you posted.
A sinc filter involves super-sampling. It is especially good for decimation because it specifically cuts off frequencies above those that can be seen in the final image. Judging from the name, I suspect the triangle filter also is a form of super-sampling. The second image you show is blurry, but I see no aliasing. So my guess is that it also uses some form of super-sampling.
Personally, I have always been confused by Adobe Photoshop, which asks me if I want "bicubic" or "bilinear" when I am scaling. But Bilinear, Bicubic, and Lanczos are interpolation methods, not decimation methods.
I can also tell you that modern video games also use super-sampling. Mipmapping is a commonly-used shortcut to realtime decimation by pre-decimating individual images by powers of two.

OpenCV perspectiveTransform correct target width and height?

I am currently working with OpenCV and its perspective transformation functions. I'd like to find a way to accurately determine the target rectangle based on the data (the source image) I have.
I already found this thread: https://stackoverflow.com/questions/7199116/perspective-transform-how-to-find-a-target-width-and-height
It states, that it is not possible to determine the correct aspect ratio correctly on the data contained in the source, but is there at least a good algorithm to get a good estimate?
No, there isn't a way to do it from just the image alone. Imagine you were taking a picture of an A4 sheet of paper resting on a table, only you were looking at it near-on horizontal. If you used the aspect ratio from the image, you'd end up with a really long, thin rectangle.
However, if you know the pose of the camera relative to the target (ie rotation matrix) and the camera intrinsic parameters, then you can get the aspect ratio.
Have a look at this paper (it's actually really interesting, though the English isn't the best): equation (20) is the key one. Also, look at this blog post where someone's implemented the approach.
If you don't know the orientation of the camera then the best bet is to put in some sort of aspect ratio that is at least ballpark. If you have any other info about the rectangle, use that (for example if I was always taking photos of A[0,1,2,...] pieces of paper, these have a known fixed aspect ratio).
good luck!

Increase image size, without messing up clarity

Are there libraries, scripts or any techniques to increase image size in height and width....
or you must need to have a super good resolution image for it?.....
Bicubic interpolation is pretty much the best you're going to get when it comes to increasing image size while maintaining as much of the original detail as possible. It's not yet possible to work the actual magic that your question would require.
The Wikipedia link above is a pretty solid reference, but there was a question asked about how it works here on Stack Overflow: How does bicubic interpolation work?
This is the highest quality resampling algorithm that Photoshop (and other graphic software) offers. Generally, it's recommended that you use bicubic smoothing when you're increasing image size, and bicubic sharpening when you're reducing image size. Sharpening can produce an over-sharpened image when you are enlarging an image, so you need to be careful.
As far as libraries or scripts, it's difficult to recommend anything without knowing what language you're intending to do this in. But I can guarantee that there's an image processing library including this algorithm already around for any of the popular languages—I wouldn't advise reimplementing it yourself.
Increasing height & width of an image means one of two things:
i) You are increasing the physical size of the image (i.e. cm or inches), without touching its content.
ii) You are trying to increase the image pixel content (ie its resolution)
So:
(i) has to do with rendering. As the image physical size goes up, you are drawing larger pixels (the DPI goes down). Good if you want to look at the image from far away (sau on a really large screen). If look at it from up close, you are going to see mostly large dots.
(ii) Is just plainly impossible. Say your image is 100X100 pixels and you want to make 200x200. This means you start from 10,000 pixels, end up with 40,000... what are you going to put in the 30,000 new pixels? Whatever your answer, you are going to end up with 30,000 invented pixels and the image you get is going to be either fuzzier, or faker, and usually both. All the techniques that increase an image size use some sort of average among neighboring pixel values, which amounts to "fuzzier".
Cheers.

Resources