What is plane in a CVPixelbuffer? - ios

In CVPixelBuffer object, have one or many planes. (reference)
We have methods to get number, heigh, the base address of plane.
So what exactly the plane is? And how it constructed inside a CVPixelBuffer?
Sample:
<CVPixelBuffer 0x1465f8b30 width=1280 height=720 pixelFormat=420v iosurface=0x14a000008 planes=2>
<Plane 0 width=1280 height=720 bytesPerRow=1280>
<Plane 1 width=640 height=360 bytesPerRow=1280>

Video formats are an incredibly complex subject.
Some video streams have the pixels stored in bytes RGBA, ARGB, ABGR, or several other variants (with or without an alpha channel)
(In RGBA format, you'd have the red, green, blue, and alpha values of a pixel one right after each other in memory, followed by another set of 4 bytes with the color values of the next pixel, etc.) This is interlaced color information.
Some video streams separate out the color channels so all the red channel, blue, green, and alpha are sent as separate "planes". You'd get a buffer with all the red information, then all the blue data, then all the green, and then alpha, if alpha is included. (Think of color negatives, where there are separate layers of emulsion to capture the different colors. The layers of emulsion are planes of color information. It's the same idea with digital.)
There are formats where the color data is in one or 2 planes, and then the luminance is in a separate plane. That's how old analog color TV works. It started out as black and white (luminance) and then broadcasters added side-band signals to convey the color information. (Chroma)
I don't muck around with CVPixelBuffers often enough to know the gory details of what you are asking, and have to invest large amounts of time and copious amounts of coffee before I can "spin up" my brain enough to grasp those gory details.
Edit:
Since your debug information shows 2 planes, it seems likely that this pixel buffer has a luminance channel and a chroma channel, as mentioned in #zeh's answer.

Although the existing and accepted answer is rich of important information when dealing with CVPixelBuffers, in this particular case the answer is wrong. The two planes that the question refers to are the luminance and chrominance planes
Luminance refers to brightness and chrominance refers to color - From Quora
The following code snippet from Apple makes it more clear:
let lumaBaseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
let lumaWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 0)
let lumaHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, 0)
let lumaRowBytes = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
var sourceLumaBuffer = vImage_Buffer(data: lumaBaseAddress,
height: vImagePixelCount(lumaHeight),
width: vImagePixelCount(lumaWidth),
rowBytes: lumaRowBytes)
let chromaBaseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 1)
let chromaWidth = CVPixelBufferGetWidthOfPlane(pixelBuffer, 1)
let chromaHeight = CVPixelBufferGetHeightOfPlane(pixelBuffer, 1)
let chromaRowBytes = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 1)
var sourceChromaBuffer = vImage_Buffer(data: chromaBaseAddress,
height: vImagePixelCount(chromaHeight),
width: vImagePixelCount(chromaWidth),
rowBytes: chromaRowBytes)
See full reference here.

Related

Unexpected behaviour with CIKernel

I made this example to show the problem. It takes 1 pixel from texture by hardcoded coordinate and use as result for each pixel in shader. I expect that all the image will be in the same color. When images are small it works perfectly, but when I work with big images it has strange result. For example, here image has size 7680x8580 and you can see 4 squares:
Here is my code
kernel vec4 colorKernel(sampler source)
{
vec4 key = sample(source, samplerTransform(source, vec2(100., 200.)));
return key;
}
Here is how I init Kernel:
override var outputImage: CIImage? {
return colorFillKernel.apply(
extent: CGRect(origin: CGPoint.zero, size: inputImage!.extent.size),
roiCallback:
{
(index, rect) in
return rect
},
arguments: [
inputImage])
}
Also, this code shows image properly, without changes and squares:
vec2 dc = destCoord();
return sample(source, samplerTransform(source, dc));
On a public documentation it says "Core Image automatically splits large images into smaller tiles for rendering, so your callback may be called multiple times." but I can't find ways how to handle this situations. I have kaleidoscopic effects and from any this tile I need to be able to get pixel from another tile as well...
I think the problem occurs due to a wrongly defined region of interest in combination with tiling.
In the roiCallback, Core Image is asking you which area of the input image (at index in case you have multiple inputs) you kernel needs to look at in order to produce the given region (rect) of the output image. The reason why this is a closure is due to tiling:
If the processed image is too large, Core Image is breaking it down into multiple tiles, renders those tiles separately, and stitches them together again afterward. And for each tile Core Image is asking you what part of the input image your kernel needs to read to produce the tile.
So for your input image, the roiCallback might be called something like four times (or even more) during rendering, for example with the following rectangles:
CGRect(x: 0, y: 0, width: 4096, height: 4096) // top left
CGRect(x: 4096, y: 0, width: 3584, height: 4096) // top right
CGRect(x: 0, y: 4096, width: 4096, height: 4484) // bottom left
CGRect(x: 4096, y: 4096, width: 3584, height: 4484) // bottom right
This is an optimization mechanism of Core Image. It wants to only read and process the pixels that are needed to produce a given region of the output. So it's best to adapt the ROI as best as possible to your use case.
Now the ROI depends on the kernel. There are basically four scenarios:
Your kernel has a 1:1 mapping between input pixel and output pixel. So in order to produce an output color value, it needs to read the pixel at the same position from the input image. In this case, you just return the input rect in your roiCallback. (Or even better, you use a CIColorKernel that is made for this use case.)
Your kernel performs some kind of convolution and not only requires the input pixel at the same coordinate as the output but also some region around it (for instance for a blur operation). Your roiCallback could look like this then:
let inset = self.radius // like radius of CIGaussianBlur
let roiCallback: CIKernelROICallback = { _, rect in
return rect.insetBy(dx: -inset, dy: -inset)
}
Your kernel always needs to read a specific region of the input, regardless of which part of the output is rendered. Then you can just return that specific region in the callback:
let roiCallback: CIKernelROICallback = { CGRect(x: 100, y: 200, width: 1, height: 1) }
The kernel always needs access to the whole input image. This is for example the case when you use some kind of lookup table to derive colors. In this case, you can just return the extent of the input and ignore the parameters:
let roiCallback: CIKernelROICallback = { inputImage.extent }
For your example, scenario 3 should be the right choice. For your kaleidoscopic effects, I assume that you need a certain region or source pixels around the destination coordinate in order to produce an output pixel. So it would be best if you'd calculate the size of that region and use a roiCallback like in scenario 2.
P.S.: Using the Core Image Kernel Language (CIKernel(source: "<code>")) is super duper deprecated now. You should consider writing your kernels in the Metal Shading Language instead. Check out this year's WWDC talk to learn more. 🙂

MTKView Displaying Wide Gamut P3 Colorspace

I'm building a real-time photo editor based on CIFilters and MetalKit. But I'm running into an issue with displaying wide gamut images in a MTKView.
Standard sRGB images display just fine, but Display P3 images are washed out.
I've tried setting the CIContext.render colorspace as the image colorspace, and still experience the issue.
Here are snippets of the code:
guard let inputImage = CIImage(mtlTexture: sourceTexture!) else { return }
let outputImage = imageEditor.processImage(inputImage)
print(colorSpace)
context.render(outputImage,
to: currentDrawable.texture,
commandBuffer: commandBuffer,
bounds: inputImage.extent,
colorSpace: colorSpace)
commandBuffer?.present(currentDrawable)
let pickedImage = info[UIImagePickerControllerOriginalImage] as! UIImage
print(pickedImage.cgImage?.colorSpace)
if let cspace = pickedImage.cgImage?.colorSpace {
colorSpace = cspace
}
I have found a similar issue on the Apple developer forums, but without any answers: https://forums.developer.apple.com/thread/66166
In order to support the wide color gamut, you need to set the colorPixelFormat of your MTKView to either BGRA10_XR or bgra10_XR_sRGB. I suspect the colorSpace property of macOS MTKViews won't be supported on iOS because color management in iOS is not active but targeted (read Best practices for color management).
Without seeing your images and their actual values, it is hard to diagnose, but I'll explain my findings & experiments. I suggest you start like I did, by debugging a single color.
For instance, what's the reddest point in P3 color space? It can be defined through a UIColor like this:
UIColor(displayP3Red: 1, green: 0, blue: 0, alpha: 1)
Add a UIButton to your view with the background set to that color for debugging purposes. You can either get the components in code to see what those values become in sRGB,
var fRed : CGFloat = 0
var fGreen : CGFloat = 0
var fBlue : CGFloat = 0
var fAlpha : CGFloat = 0
let c = UIColor(displayP3Red: 1, green: 0, blue: 0, alpha: 1)
c.getRed(&fRed, green: &fGreen, blue: &fBlue, alpha: &fAlpha)
or you can use the Calculator in macOS Color Sync Utility,
Make sure you select Extended Range, otherwise the values will be clamped to 0 and 1.
So, as you can see, your P3(1, 0, 0) corresponds to (1.0930, -0.2267, -0.1501) in extended sRGB.
Now, back to your MTKView,
If you set the colorPixelFormat of your MTKView to .BGRA10_XR, then you obtain the brightest red if the output of your shader is,
(1.0930, -0.2267, -0.1501)
If you set the colorPixelFormat of your MTKView to .bgra10_XR_sRGB, then you obtain the brightest red if the output of your shader is,
(1.22486, -0.0420312, -0.0196301)
because you have to write a linear RGB value, since this texture format will apply the gamma correction for you. Be careful when applying the inverse gamma, since there are negative values. I use this function,
let f = {(c: Float) -> Float in
if fabs(c) <= 0.04045 {
return c / 12.92
}
return sign(c) * powf((fabs(c) + 0.055) / 1.055, 2.4)
}
The last missing piece is creating a wide gamut UIImage. Set the color space to CGColorSpace.displayP3 and copy the data over. But what data, right? The brightest red in this image will be
(1, 0, 0)
or (65535, 0, 0) in 16-bit ints.
What I do in my code is using .rgba16Unorm textures to manipulate images in displayP3 color space, where (1, 0, 0) will be the brightest red in P3. This way, I can directly copy over its contents to a UIImage. Then, for displaying, I pass a color transform to the shader to convert from P3 to extended sRGB (so, not saturating colors) before displaying. I use linear color, so my transform is just a 3x3 matrix. I set my view to .bgra10_XR_sRGB, so the gamma will be applied automatically for me.
That (column-major) matrix is,
1.2249 -0.2247 0
-0.0420 1.0419 0
-0.0197 -0.0786 1.0979
You can read about how I generated it here: Exploring the display-P3 color space
Here's an example I built using UIButtons and an MTKView, screen-captured on an iPhoneX,
The button on the left is the brightest red on sRGB, while the button on the right is using a displayP3 color. At the center, I placed an MTKView that outputs the transformed linear color as described above.
Same experiment for green,
Now, if you see this on a recent iPhone or iPad, you should see the both the square in the center and the button to the right have the same bright colors. If you see this on a Mac that can't display them, the left button will appear the same color. If you see this in a Windows machine or a browser without proper color management, the left button may also appear to be of a different color, but that's only because the whole image is interpreted as sRGB and obviously those pixels have different values... But the appearance won't be correct.
If you want more references, check the testP3UIColor unit test I added here: ColorTests.swift,
my functions to initialize the UIImage: Image.swift,
and a sample app to try out the conversions: SampleColorPalette
I haven't experimented with CIImages, but I guess the same principles apply.
I hope this information is of some help. It also took me long to figure out how to display colors properly because I couldn't find any explicit reference to displayP3 support in the Metal SDK documentation.

How to crop/resize texture array in Metal

Say I have a N-channel MPSImage or texture array that is based on MTLTexture.
How do I crop a region from it, copying all the N channels, but changing "pixel size"?
I'll just address the crop case, since the resize case involves resampling and is marginally more complicated. Let me know if you really need that.
Let's assume your source MPSImage is a 12 feature channel (3 slice) image that is 128x128 pixels, that your destination image is an 8 feature channel image (2 slices) that is 64x64 pixels, and that you want to copy the bottom-right 64x64 region of the last two slices of the source into the destination.
There is no API that I'm aware of that allows you to copy from/to multiple slices of an array texture at once, so you'll need to issue multiple blit commands to cover all the slices:
let sourceRegion = MTLRegionMake3D(64, 64, 0, 64, 64, 1)
let destOrigin = MTLOrigin(x: 0, y: 0, z: 0)
let firstSlice = 1
let lastSlice = 2 // inclusive
let commandBuffer = commandQueue.makeCommandBuffer()
let blitEncoder = commandBuffer.makeBlitCommandEncoder()
for slice in firstSlice...lastSlice {
blitEncoder.copy(from: sourceImage.texture,
sourceSlice: slice,
sourceLevel: 0,
sourceOrigin: sourceRegion.origin,
sourceSize: sourceRegion.size,
to: destImage.texture,
destinationSlice: slice - firstSlice,
destinationLevel: 0,
destinationOrigin: destOrigin)
}
blitEncoder.endEncoding()
commandBuffer.commit()
I'm not sure why you want to crop, but keep in mind that the MPSCNN layers can work on a smaller portion of your MPSImage. Just set the offset and clipRect properties and the layer will only work on that region of the source image.
In fact, you could do your crops this way using an MPSCNNNeuronLinear. Not sure if that is any faster or slower than using a blit encoder but it's definitely simpler.
Edit: added a code example. This is typed from memory so it may have small errors, but this is the general idea:
// Declare this somewhere:
let linearNeuron = MPSCNNNeuronLinear(a: 1, b: 0)
Then when you run your neural network, add the following:
let yourImage: MPSImage = ...
let commandBuffer = ...
// This describes the size of the cropped image.
let imgDesc = MPSImageDescriptor(...)
// If you're going to use the cropped image in other layers
// then it's a good idea to make it a temporary image.
let tempImg = MPSTemporaryImage(commandBuffer: commandBuffer, imageDescriptor: imgDesc)
// Set the cropping offset:
linearNeuron.offset = MPSOffset(x: ..., y: ..., z: 0)
// The clip rect is the size of the output image.
linearNeuron.clipRect = MTLRegionMake(0, 0, imgDesc.width, imgDesc.height)
linearNeuron.encode(commandBuffer: commandBuffer, sourceImage: yourImage, destinationImage: tempImg)
// Here do your other layers, taking tempImg as input.
. . .
commandBuffer.commit()

openv HSV is soo noisy

Good day
I am try to filter video by subtracting some colors in specified range.
but while the recorded image is still or not changed but the HSV filtered image looks shaken and not stable.
this shake or instability cause lot's of problem in my processing.
is there any way that I can filter image in stable way
this is sample code of my filter ... part of the code
while (1)
{
//first frame read
cap.read(origonal1);
morphOps(origonal1);
cvtColor(origonal1, HSV1, COLOR_BGR2HSV);
inRange(HSV1, Scalar(0, 129,173), Scalar(26,212, 255), thresholdImage1);
waitKey(36);
//second image read and convert it to HSV
cap.read(origonal2);
morphOps(origonal2);
cvtColor(origonal2, HSV2, COLOR_BGR2HSV);
inRange(HSV2, Scalar(28, 89, 87), Scalar(93, 255, 255),thresholdImage2);
morphOps(thresholdImage1);
morphOps(thresholdImage2);
//create a mask so that i only detect motion of certain color range and don't
//care about other colors motion detection
maskImage = thresholdImage1 | thresholdImage2;
//make the difference between images
absdiff(thresholdImage1,thresholdImage2,imageDifference);
imageDifference = imageDifference&maskImage;
morphOps(imageDifference);
imshow("threshold Image", imageDifference);
//search for movement now update the origonal image
searchForMovement(thresholdImage1, origonal1);
imshow("origonal", origonal1);
imshow("HSV", HSV1);
imshow("threshold1", thresholdImage1);
imshow("threshold2", thresholdImage2);
//wait for a while give a break to the processor
//waitKey(1000);
}
Thanks in advance.
try this function
fastNlMeansDenoisingColored( frame, frame_result, 3, 3, 7, 21 );
it's too slow but good for trying.

how to optimized this image processing replace all pixels on image with closest available RGB?

Im' trying to replace all pixels of input image with closest available RGB. I have a array contain color and input image. Here is my code, it give me an output image as expected, BUT it take very LONG time( about a min) to process one image. Can anybody help me improve the code? Or if you have any other suggestions, please help.
UIGraphicsBeginImageContextWithOptions(CGSizeMake(CGImageGetWidth(sourceImage),CGImageGetHeight(sourceImage)), NO, 0.0f);
//Context size I keep as same as original input image size
//Otherwise, the output will be only a partial image
CGContextRef context;
context = UIGraphicsGetCurrentContext();
//This is for flipping up sidedown
CGContextTranslateCTM(context, 0, self.imageViewArea.image.size.height);
CGContextScaleCTM(context, 1.0, -1.0);
// init vars
float d = 0; // squared error
int idx = 0; // index of palette color
int min = 1000000; // min difference
UIColor *oneRGB; // color at a pixel
UIColor *paletteRGB; // palette color
// visit each output color and determine closest color from palette
for(int y=0; y<sizeY; y++) {
for(int x=0; x<sizeX; x++) {
// desired (avg) color is one pixel of scaled image
oneRGB = [inputImgAvg colorAtPixel:CGPointMake(x,y)];
// find closest color match in palette: init idx with index
// of closest match; keep track of min to find idx
min = 1000000;
idx = 0;
CGContextDrawImage(context,CGRectMake(xx, yy, 1, 1),img);
}
}
UIImage *output = UIGraphicsGetImageFromCurrentImageContext();
UIGraphicsEndImageContext();
self.imageViewArea.image = output;
This is a similar question (with no definitive answer), but the answer there has the code for directly accessing pixels from an image.
Quantize Image, Save List of Remaining Colors
You should do that rather than use CG functions for each get and set pixel. Drawing 1 pixel of an image onto another image is a lot slower than changing 3 bytes in a array.
Also, what's in ColorDiff -- you don't need perfect diffing as long as the closest pixel has the smallest diff. There may be room for pre-processing this list so that for each palette entry you have the smallest diff to the nearest other palette entry. Then, while looping through pixels, I can quickly check to see if the next pixel is within half that distance to the color just found (because photos tend to have common colors near each other).
If that's not a match, then while looping through the palette, if I am within half this distance to any entry, there is no need to check further.
Basically, this puts a zone around each palette entry where you know for sure that this one is the closest.
The usual answer is to use a k-d tree or some other Octree structure to reduce the number of computations and comparisons that have to be done at each pixel.
I've also had success with partitioning the color space into a regular grid and keeping a list of possible closest matches for each part of the grid. For example you can divide the (0-255) values of R,G,B by 16 and end up with a grid of (16,16,16) or 4096 elements altogether. Best case is that there's only one member of the list for a particular grid element and no need to traverse the list at all.

Resources