Use first MLModel MLMultiArray output as second MLModel MLMultiArray Input - ios

I have two CoreML MLMModels (converted from .pb).
The first model outputs a Float32 3 × 512 × 512 MLMultiArray, which basically describes an image.
The second model input is a Float32 1 × 360 × 640 × 3 MLMultiArray, which is also an image but with a different size.
I know that in theory, I can convert the second model input into an image, and then convert the first model output to an image (post-prediction), resize it, and feed the second model, but It feels not very efficient and there is already a significant delay caused by the models, so I'm trying to improve performance.
Is it possible to "resize"/"reshape"/"transposed" the first model output to match the second model input? I'm using https://github.com/hollance/CoreMLHelpers (by the amazing
Matthijs Hollemans) helpers, but I don't really understand how to do it without damaging the data and keeping it as efficient as possible.
Thanks!

You don't have to turn them into images. Some options for using MLMultiArrays instead of images:
You could take the 512x512 output from the first model and chop off a portion to make it 360x512, and then pad the other dimension to make it 360x640. But that's probably not what you want. In case it is, you'll have to write to code for this yourself.
You can also resize the 512x512 output to 360x640 by hand. To do this you will need to implement a suitable resizing option yourself (probably bilinear interpolation) or convert the data so you can use OpenCV or the vImage framework.
Let the model do the above. Add a ResizeBilinearLayer to the model, followed by a PermuteLayer or TransposeLayer to change the order of the dimensions. Now the image will be resized to 360x640 pixels, and the output of the first model is 1x360x640x3. This is easiest if you add these operation to the original model and then let coremltools convert them to the appropriate Core ML layers.

Related

caffe resnet50: I don't want resizing to 256 and cropping 224?

I have training data already aligned to 224x224 in dimension. It would be wrong to resize to 256x256 and then random crop. Is there a way to skip this transformation and use the images as it is?
Yes, you only need to edit accordingly the transformation params of the input data layer you are using.

How can I feed an image into my neural network?

So far I have trained my neural network is trained on the MNIST data set (from this tutorial). Now, I want to test it by feeding my own images into it.
I've processed the image using OpenCV by making the dimensions 28x28 pixels, turning it into grayscale, and using adaptive thresholding. Where do I proceed from here?
An 'image' is a 28x28 array of values from 0-1... so not really an image. Just greyscaling your original image will not make it fit for input. You have to go through the following steps.
Load your image into your programming langauge, with 784 rgb values representing pixels
For each rgb value, take the average of r, g and b. Then divide this value by 255. You will now have the greyscale of an image, a value between 0 and 1.
Replace the rgb values with the greyscale values
You will now have an image which looks like this (see the right array):
So you must do everything through your programming language. If you just greyscale an image with a photoeditor, the pixels will still be r,g,b.
You can use libraries like PIL, skimage that let you load the data into numpy arrays in python and also support many image operations like grayscaling, scaling etc.
After you have processed the image and read the data into numpy array you can then feed this to your network.

How to downsample RAW image without de-bayer for editing on smaller screens?

I have a question about reducing the overall size of a RAW image without going into the linear space. The reason is, I want to try to edit a very large megapixel image (60+ megapixels), but don't need the full image while editing on something like an iPad or iPhone screen. Once the edit is done, I do want to save out the original. The speed for saving isn't a concern, it's the editing done on the "working" image that I'm previewing the edits on.
I want to preserve the RAW data because I want to leverage the new CoreImage RAW abilities and write some of my own RAW CIFilters, but don't need to be working on a gigantic RAW image the whole time.
A plus is if this can be done with something in Swift, or any language that I can bridge. The actual resizing does not have to be super fast, and would probably be a one time operation before even starting to edit.
I believe there might be two approaches from reading this post:
De-bayer the RAW image to a linear space, then resizing, and converting back to bayer format RAW, but I don't know if I can preserve the data in the downsampling in that way.
Somehow manipulate the dimensions by some factor to get it smaller. But this is what I need help understanding.
Thank you!
I'm not intimately familiar with CoreImage or image processing in Swift/iOS in general, but let me try to give you at least a starting point.
A raw image (mosaiced image) is essentially a one-channel image, where different pixels correspond to different colors. A common layout may look like:
R G R G
G B G B
R G R G
G B G B
Note that in this arrangement (which is common for most mosaiced files, with notable exceptions), each 2x2 pixel group repeats across the image. For the purposes of resizing, these 2x2 regions can be considered as superpixels.
I'm assuming you have access to pixel data of your image buffer, either from a file or from memory.
The simplest way to efficiently generate a lower resolution RAW image will be to downsample the image by an integer factor. To do so, you would simply take every nth superpixel along rows and columns to form a new RAW image.
However, this operation can cause image artifacts such as Aliasing to appear in your image, as you may be reducing the Nyquist frequency of your new RAW image to less than that of the highest frequency content in the original RAW image.
To avoid this, you would want to apply an Anti-Aliasing Filter before your downsampling. However, because you have not yet demosaiced your image and your different color channels (R, G, B) are not necessarily correlated, you would need to apply this filtering per color channel.
This is easily accomplished for the R and B channels which form rectangular grids, but the G channel is significantly more difficult. Perhaps the easiest possibility to overcome this difficult would be to apply filtration to both rectangular G grids across your image separately.
Now, I'm assuming most of this functionality would need to be implemented from scratch, as RAW image anti-aliased downsampling is not a common function. You may find that you save significant time by simply demosaicing the original RAW, using provided resampling functions to generate a low-resolution preview, and allow for image adjustment on the demosaiced preview. Then, when you are looking to save out a full-resolution edit, go back and apply the previewed changes to the full-resolution, mosaiced RAW image.
Hope this can provide some starting points.

How can I write a histogram-like kernel filter for CoreImage?

In the docs for Kernel Routine Rules, it says 'A kernel routine computes an output pixel by using an inverse mapping back to the corresponding pixels of the input images. Although you can express most pixel computations this way—some more naturally than others—there are some image processing operations for which this is difficult, if not impossible. For example, computing a histogram is difficult to describe as an inverse mapping to the source image.'
However, apple obviously is doing it somehow because they do have a CIAreaHistogram Core Image Filter that does just that.
I can see one theoretical way to do it with the given limitations:
Lets say you wanted a 256 element red-channel histogram...
You have a 256x1 pixel output image. The kernel function gets called for each of those 256 pixels. The kernel function would have to read EVERY PIXEL IN THE ENTIRE IMAGE each time its called, checking if that pixel's red value matches that bucket and incrementing a counter. When its processed every pixel in the entire image for that output pixel, it divides by the total number of pixels and sets that output pixel value to that calculated value. The problem is, assuming it actually works, this is horribly inefficient, since every input pixel is accessed 256 times, although every output pixel is written only once.
What would be optimal would be a way for the kernel to iterate over every INPUT pixel, and let us update any of the output pixels based on that value. Then the input pixels would each be read only once, and the output pixels would be read and written a total of (input width)x(input height) times altogether.
Does anyone know of any way to get this kind of filter working? Obviously there's a filter available from apple for doing a histogram, but I need it for doing a more limited form of histogram. (For example, a blue histogram limited to samples that have a red value in a given range.)
The issue with this is that custom kernel code in Core Image works like a function which remaps pixel by pixel. You don't actually have a ton of information to go off of except for the pixel that you are currently computing. A custom core image filter sort of goes like this
for i in 1 ... image.width
for j in 1 ... image.height
New_Image[i][j] = CustomKernel(Current_Image[i][j])
end
end
So actually, it's not really plausible to make your own histogram via custom kernels, because you literally do not have any control over the new image other than in that CustomKernel function that has been made. This is actually one of the reasons that CIImageProcessor was created for iOS10, you probably would have an easier time making a histogram via that function(and also producing other cool affects via image processing), and I suggest checking out the WWDC 2016 video on it ( Raw images and live images session).
IIRC, if you really want to make a histogram, it is still possible, but you will have to work with the UIImage version, and then convert the resulting image to an RBG image for which you can do the counting, and storing them in bins. I would recommend Simon Gladman's book on this, as he has a chapter devoted to histograms, but there is a lot more that goes into the core image default version because they have MUCH more control over the image than we do using the framework.

how to represent and display 9 bit image using opencv

I have a 8-bit grayscale image and I apply a transformation (modified census transform) to it. After transformation I need to represent each pixel of the image with 9-bit. I store my 9-bit data in uint16 and when I want to display my image I used two different methods. I'm not sure which one is the right way to do it or if there are any better approaches to do it.
1- Take the most significant 8-bit from the 9-bit and represent image as 8-bit.
2- Divide each pixel value to 2 and represent image as 8-bit.
In both way there is a loss of information. Could anyone suggest a better way to do this?
Thank you
Why don't you just normalize them? normalizedValue=currentValue/maximumValue. Then just display the normalized image?
The number of intensity levels which you can represent depends very much on your hardware. Even if you somehow manage to represent extra grey levels you wont be able to differentiate among them. The two methods which you proposed in your question are essentially the same.

Resources