Apple Metal MPSImage memory layout - ios

I'm having trouble working with the underlying memory of an MPSImage. I've been using the methods getBytes and replace on MPSImage's texture member variable to read and write the underlying data. The problem is I can't find documentation of how the memory is interpreted as an image (i.e. how the rows, columns, and channels are layed out). Part of what complicates the issue is that regardless of the number of feature channels, the data is stored as a stack of RGBA texture slices, with some channels possibly left unused. For instance, with 3 feature channels, there will be one RGBA texture slice, and one channel's worth of space will be left unused.
The problem is, how is the MPSImage data actually arranged within the texture? It seems more complicated than I originally would have guessed.
After much experimentation, it seems like the data is arranged differently depending on whether the number of feature channels is < 4 or > 4. But I'm still having trouble figuring it out.
Can anyone explain the MPSImage data layout to me?

The first four feature channels are encoded as they would be for a standard RGBA texture. Feature channel 0 is in the "R" position, feature channel 1 is in the "G" position and so forth.
The next four feature channels are present as the next slice in a texture2d_array. If you have a 100x100 image with 20 feature channels, this will be encoded as a 100x100 texture array with (20/4=) 5 slices in the array.
To make matters more complicated, you can have MPSImage arrays with have multiple images in them, each with more than 4 feature channels. This is frequently referred to as batching. The second image is found in the texture array immediately after the first image. If we have multiple 100x100x20 images in the MPSImage, then the second one starts at slice 5, the third one at slice 10 and so forth.

Related

How can I write a histogram-like kernel filter for CoreImage?

In the docs for Kernel Routine Rules, it says 'A kernel routine computes an output pixel by using an inverse mapping back to the corresponding pixels of the input images. Although you can express most pixel computations this way—some more naturally than others—there are some image processing operations for which this is difficult, if not impossible. For example, computing a histogram is difficult to describe as an inverse mapping to the source image.'
However, apple obviously is doing it somehow because they do have a CIAreaHistogram Core Image Filter that does just that.
I can see one theoretical way to do it with the given limitations:
Lets say you wanted a 256 element red-channel histogram...
You have a 256x1 pixel output image. The kernel function gets called for each of those 256 pixels. The kernel function would have to read EVERY PIXEL IN THE ENTIRE IMAGE each time its called, checking if that pixel's red value matches that bucket and incrementing a counter. When its processed every pixel in the entire image for that output pixel, it divides by the total number of pixels and sets that output pixel value to that calculated value. The problem is, assuming it actually works, this is horribly inefficient, since every input pixel is accessed 256 times, although every output pixel is written only once.
What would be optimal would be a way for the kernel to iterate over every INPUT pixel, and let us update any of the output pixels based on that value. Then the input pixels would each be read only once, and the output pixels would be read and written a total of (input width)x(input height) times altogether.
Does anyone know of any way to get this kind of filter working? Obviously there's a filter available from apple for doing a histogram, but I need it for doing a more limited form of histogram. (For example, a blue histogram limited to samples that have a red value in a given range.)
The issue with this is that custom kernel code in Core Image works like a function which remaps pixel by pixel. You don't actually have a ton of information to go off of except for the pixel that you are currently computing. A custom core image filter sort of goes like this
for i in 1 ... image.width
for j in 1 ... image.height
New_Image[i][j] = CustomKernel(Current_Image[i][j])
end
end
So actually, it's not really plausible to make your own histogram via custom kernels, because you literally do not have any control over the new image other than in that CustomKernel function that has been made. This is actually one of the reasons that CIImageProcessor was created for iOS10, you probably would have an easier time making a histogram via that function(and also producing other cool affects via image processing), and I suggest checking out the WWDC 2016 video on it ( Raw images and live images session).
IIRC, if you really want to make a histogram, it is still possible, but you will have to work with the UIImage version, and then convert the resulting image to an RBG image for which you can do the counting, and storing them in bins. I would recommend Simon Gladman's book on this, as he has a chapter devoted to histograms, but there is a lot more that goes into the core image default version because they have MUCH more control over the image than we do using the framework.

Most Efficient way of Multi-Texturing - iOS, OpenGL ES2, optimization

I'm trying to find the most efficient way of handling multi-texturing in OpenGL ES2 on iOS. By 'efficient' I mean the fastest rendering even on older iOS devices (iPhone 4 and up) - but also balancing convenience.
I've considered (and tried) several different methods. But have run into a couple of problems and questions.
Method 1 - My base and normal values are rgb with NO ALPHA. For these objects I don't need transparency. My emission and specular information are each only one channel. To reduce texture2D() calls I figured I could store the emission as the alpha channel of the base, and the specular as the alpha of the normal. With each being in their own file it would look like this:
My problem so far has been finding a file format that will support a full non-premultiplied alpha channel. PNG just hasn't worked for me. Every way that I've tried to save this as a PNG premultiplies the .alpha with the .rgb on file save (via photoshop) basically destroying the .rgb. Any pixel with a 0.0 alpha has a black rgb when I reload the file. I posted that question here with no activity.
I know this method would yield faster renders if I could work out a way to save and load this independent 4th channel. But so far I haven't been able to and had to move on.
Method 2 - When that didn't work I moved on to a single 4-way texture where each quadrant has a different map. This doesn't reduce texture2D() calls but it reduces the number of textures that are being accessed within the shader.
The 4-way texture does require that I modify the texture coordinates within the shader. For model flexibility I leave the texcoords as is in the model's structure and modify them in the shader like so:
v_fragmentTexCoord0 = a_vertexTexCoord0 * 0.5;
v_fragmentTexCoord1 = v_fragmentTexCoord0 + vec2(0.0, 0.5); // illumination frag is up half
v_fragmentTexCoord2 = v_fragmentTexCoord0 + vec2(0.5, 0.5); // shininess frag is up and over
v_fragmentTexCoord3 = v_fragmentTexCoord0 + vec2(0.5, 0.0); // normal frag is over half
To avoid dynamic texture lookups (Thanks Brad Larson) I moved these offsets to the vertex shader and keep them out of the fragment shader.
But my question here is: Does reducing the number of texture samplers used in a shader matter? Or would I be better off using 4 different smaller textures here?
The one problem I did have with this was bleed over between the different maps. A texcoord of 1.0 was was averaging in some of the blue normal pixels due to linear texture mapping. This added a blue edge on the object near the seam. To avoid it I had to change my UV mapping to not get too close to the edge. And that's a pain to do with very many objects.
Method 3 would be to combine methods 1 and 2. and have the base.rgb + emission.a on one side and normal.rgb + specular.a on the other. But again I still have this problem getting an independent alpha to save in a file.
Maybe I could save them as two files but combine them during loading before sending it over to openGL. I'll have to try that.
Method 4 Finally, In a 3d world if I have 20 different panel textures for walls, should these be individual files or all packed in a single texture atlas? I recently noticed that at some point minecraft moved from an atlas to individual textures - albeit they are 16x16 each.
With a single model and by modifying the texture coordinates (which I'm already doing in method 2 and 3 above), you can easily send an offset to the shader to select a particular map in an atlas:
v_fragmentTexCoord0 = u_texOffset + a_vertexTexCoord0 * u_texScale;
This offers a lot of flexibility and reduces the number of texture bindings. It's basically how I'm doing it in my game now. But IS IT faster to access a small portion of a larger texture and have the above math in the vertex shader? Or is it faster to repeatedly bind smaller textures over and over? Especially if you're not sorting objects by texture.
I know this is a lot. But the main question here is what's the most efficient method considering speed + convenience? Will method 4 be faster for multiple textures or would multiple rebinds be faster? Or is there some other way that I'm overlooking. I see all these 3d games with a lot of graphics and area coverage. How do they keep frame rates up, especially on older devices like the iphone4?
**** UPDATE ****
Since I've suddenly had 2 answers in the last few days I'll say this. Basically I did find the answer. Or AN answer. The question is which method is more efficient? Meaning which method will result in the best frame rates. I've tried the various methods above and on the iPhone 5 they're all just about as fast. The iPhone5/5S has an extremely fast gpu. Where it matters is on older devices like the iPhone4/4S, or on larger devices like a retina iPad. My tests were not scientific and I don't have ms speeds to report. But 4 texture2D() calls to 4 RGBA textures was actually just as fast or maybe even faster than 4 texture2d() calls to a single texture with offsets. And of course I do those offset calculations in the vertex shader and not the fragment shader (never in the fragment shader).
So maybe someday I'll do the tests and make a grid with some numbers to report. But I don't have time to do that right now and write a proper answer myself. And I can't really checkmark any other answer that isn't answering the question cause that's not how SO works.
But thanks to the people who have answered. And check out this other question of mine that also answered some of this one: Load an RGBA image from two jpegs on iOS - OpenGL ES 2.0
Have a post process step in your content pipeline where you merge your rgb with alpha texture and store it in a. Ktx file when you package the game or as a post build event when you compile.
It's fairly trivial format and would be simple to write such command-line tool that loads 2 png's and merges these into one Ktx, rgb + alpha.
Some benefits by doing that is
- less cpu overhead when loading the file at game start up, so the games starts quicker.
- Some GPUso does not natively support rgb 24bit format, which would force the driver to internally convert it to rgba 32bit. This adds more time to the loading stage and temporary memory usage.
Now when you got the data in a texture object, you do want to minimize texture sampling as it means alot of gpu operations and memory accesses depending on filtering mode.
I would recommend to have 2 textures with 2 layers each since there's issues if you do add all of them to the same one is potential artifacts when you sample with bilinear or mipmapped as it may include neighbour pixels close to edge where one texture layer ends and the second begins, or if you decided to have mipmaps generated.
As an extra improvement I would recommend not having raw rgba 32bit data in the Ktx, but actually compressing it into a dxt or pvrtc format. This would use much less memory which means faster loading times and less memory transfers for the gpu, as memory bandwidth is limited.
Of course, adding the compressor to the post process tool is slightly more complex.
Do note that compressed textures do loose a bit of the quality depending on algorithm and implementation.
Silly question but are you sure you are sampler limited? It just seems to me that, with your "two 2-way textures" you are potentially pulling in a lot of texture data, and you might instead be bandwidth limited.
What if you were to use 3 textures [ BaseRGB, NormalRBG, and combined Emission+Specular] and use PVRTC compression? Depending on the detail, you might even be able to use 2bpp (rather than 4bpp) for the BaseRGB and/or Emission+Specular.
For the Normals I'd probably stick to 4bpp. Further, if you can afford the shader instructions, only store the R&G channels (putting 0 in the blue channel) and re-derive the blue channel with a bit of maths. This should give better quality.

How to add a colour palette to b/w image?

I have noticed in some tile based game engines, tiles are saved as grayscale or sometimes even black or white, and the colour is then added through storing a 'palette' along with it to apply to certain pixels however i've never seen how it knows which pixels.
Just to name a few engines i've seen use this, Notch's Minicraft and the old Pokemon games for Gameboy. This is what informed me of how a colour palette is used in old games: deconstructulator
From the little i've seen of people use this technique in tutorials it uses a form of bit-shifting however i'd like to know how that was so efficient that it was next to mandatory in old 8-bit consoles - how it is possible to apply red, green and blue to specific pixels of an image every frame instead of saving the whole coloured image (some pseudo-code would be nice).
The efficient thing about it is that it saves memory. Storing the RGB values usually requires 24 bits (8 bits per channel). While having a palette of 256 colors (requiring 256*24 bits = 768 bytes) each pixel requires just 8 bits (2^8 = 256 colors). So three times as many pixels can be stored in the same amount of memory (if you don't count the needed palette), but with a limitied set of colors obviously. This used to be supported in hardware so the graphics memory could be more efficiently used too (well this is actually still supported in modern PC hardware but almost never used since the graphics memory isn't that limited anymore).
Some hardware (including the Super Gameboy supported by the first Pokemon games) used more than one hardware palette at a time. Different sets of tiles is mapped to different palettes. But the way the tiles are mapped to different palettes is very hardware dependent and often not very straight forward.
The way the bits of the image is stored isn't always so straight forward either. If the pixels are 8 bits it can be an array of bytes where every byte simply is one pixel (as for the classic VGA mode used in many old DOS games). But the Gameboy for example uses 2 bits per pixel and others used 4, that is 2^4 = 16 colors per palette. A common way to arrange the bits in memory is by using bitplanes. That is (in the case of 16 color graphics) storing 4 separate b/w images. But the bitplanes can in some cases also be interleaved in different ways. So there is no simple and generic answer to how to decode/encode graphics this way. So I guess you have to be more specific what you want pseudo code for.
And I'm not sure how any of this applies to Minicraft. Maybe just how the graphics is stored on disk. But it has no significance once the graphics is loaded into graphics memory. (Maybe you have some other feature of Minicraft in mind?)

Understanding just what is an image

I suppose the simplest understanding of what a (bitmap) image is would be an array of pixels. After that, it gets pretty technical.
I've been trying to understand the sort of information that an image may provide and have come across a large collection of technical terms like "mipmap", "pitch", "stride", "linear", "depth", as well as other format-specific things.
These seem to pop up across a lot of different formats so it'd probably be useful to understand what purpose they serve in an image. Looking at the DDS, BMP, PNG, TGA, JPG documentations has only made it clear that an image is pretty confusing.
Though searching around for some hours, there wasn't any nice tutorial-like break-down of just what an image is and all of the different properties.
The eventual goal would be to take proprietary image formats and convert them to more common formats like DDS or BMP. Or to make up some image format.
Any good readings?
Even your simplified explanation of an image doesn't encompass all the possibilities. For example an image can be divided by planes, where the red pixel values are all together followed by the green pixel values, followed by the blue pixel values. Such layouts are uncommon but still possible.
Assuming a simple layout of pixels you must still determine the pixel format. You might have a paletted image where some number of bits (1, 4, or 8) will be an index into a palette or color table which will define the RGB color of the pixel along with the transparency of the pixel (one index will typically be reserved as a transparent pixel). Otherwise the pixel will be 3 or 4 bytes depending on whether a transparency or alpha value is included. The order of the values (R,G,B) or (B,G,R) will depend on the format - Windows bitmaps are B,G,R while everything else will most likely be R,G,B.
The stride is the number of bytes between rows of the image. Windows bitmaps for example will take the width of the image times the number of bytes per pixel and round it up to the next multiple of 4 bytes.
I've never heard of DDA, and BMP is only common in the Windows world (and there's a lot more computing in the non-windows world than you might think). Rather than worry about all of the technical details of this, why not just use an existing toolkit such as image magick, which can already batch convert from dozens of formats to your one common format?
Unless you're doing specialized work, where you would need something fancy like hdr (which most image formats don't even support -- so most of your sources would not have it in the first place), you're probably best off picking something standard like PNG or JPG. They both have plusses and minuses. You might want to support both of those depending on the image.

How can I recognize slightly modified images?

I have a very large database of jpeg images, about 2 million. I would like to do a fuzzy search for duplicates among those images. Duplicate images are two images that have many (around half) of their pixels with identical values and the rest are off by about +/- 3 in their R/G/B values. The images are identical to the naked eye. It's the kind of difference you'd get from re-compressing a jpeg.
I already have a foolproof way to detect if two images are identical: I sum the delta-brightness over all the pixels and compare to a threshold. This method has proven 100% accurate but doing 1 photo against 2 million is incredibly slow (hours per photo).
I would like to fingerprint the images in a way that I could just compare the fingerprints in a hash table. Even if I can reliably whittle down the number of images that I need to compare to just 100, I would be in great shape to compare 1 to 100. What would be a good algorithm for this?
Have a look at O. Chum, J. Philbin, and A. Zisserman, Near duplicate image detection: min-hash and tf-idf weighting, in Proceedings of the British Machine Vision Conference, 2008. They solve the problem you have and demonstrate the results for 146k images. However, I have no first-hand experience with their approach.
Naive idea: create a small thumbnail (50x50 pixels) to find "probably identical" images, then increase thumbnail size to discard more images.
Building on the idea of minHash...
My idea is to make 100 look-up tables using all the images currently in the database. The look-up tables are mapping from the brightness of a particular pixel to a list of images that have that same brightness in that same pixel. To search for an image just input it into the hash tables, get 100 lists, and score a point for each image when it shows up in a list. Each image will have a score from 0 to 100. The image with the most points wins.
There are many issues with how to do this within reasonable memory constraints and how to do it quickly. Proper data structures are needed for storage on disk. Tweaking of the hashing value, number of tables, etc, is possible, too. If more information is needed, I can expand on this.
My results have been very good. I'm able to index one million images in about 24 hours on one computer and I can lookup 20 images per second. Accuracy is astounding as far as I can tell.
I don't think this problem can be solved by hashing. Here's the difficulty: suppose you have a red pixel, and you want 3 and 5 to hash to the same value. Well, then you also want 5 and 7 to hash to the same value, and 7 and 9, and so on... you can construct a chain that says you want all pixels to hash to the same value.
Here's what I would try instead:
Build a huge B-tree, with 32-way fanout at each node, containing all of the images.
All images in the tree are the same size, or they're not duplicates.
Give each colored pixel a unique number starting at zero. Upper left might be numbered 0, 1, 2 for the R, G, B components, or you might be better off with a random permutation, because you're going to compare images in order of that numbering.
An internal node at depth n discriminates 32 ways on the value of the pixel n divided by 8 (this gets out some of the noise in nearby pixels.
A leaf node contains some small number of images, let's say 10 to 100. Or maybe the number of images is an increasing function of depth, so that if you have 500 duplicates of one image, after a certain depth you stop trying to distinguish them.
One all two million nodes are inserted in the tree, two images are duplicate only if they're at the same node. Right? Wrong! If the pixel value in two images are 127 and 128, one goes into outedge 15 and the other goes into outedge 16. So actually when you discriminate on a pixel, you may insert that image into one or two children:
For brightness B, insert at B/8, (B-3)/8, and (B+3)/8. Sometimes all 3 will be equal, and always 2 of 3 will be equal. But with probability 3/8, you double the number of outedges on which the image appears. Depending on how deep things go you could have lots of extra nodes.
Someone else will have to do the math and see if you have to divide by something larger than 8 to keep images from being duplicated too much. The good news is that even if the true fanout is only around 4 instead of 32, you only need a tree of depth 10. Four duplications in 10 takes you up to 32 million images at the leaves. I hope you have plenty of RAM at your disposal! If not, you can put the tree in the filesystem.
Let me know how this goes!
Also good about hash from thumbnails: scaled duplicates are recognized (with little modification)

Resources