Detecting if a MTLTexture is blank - ios

I am trying to determine if a MTLTexture (in bgra8Unorm format) is blank by calculating the sum of all the R G B and A components of each of its pixels.
This function intends to do this by adding adjacent floats in memory after a texture has been copied to a pointer. However I have determined that this function ends up returning false nomatter the MTLTexture given.
What is wrong with this function?
func anythingHere(_ texture: MTLTexture) -> Bool {
let width = texture.width
let height = texture.height
let bytesPerRow = width * 4
let data = UnsafeMutableRawPointer.allocate(bytes: bytesPerRow * height, alignedTo: 4)
defer {
data.deallocate(bytes: bytesPerRow * height, alignedTo: 4)
}
let region = MTLRegionMake2D(0, 0, width, height)
texture.getBytes(data, bytesPerRow: bytesPerRow, from: region, mipmapLevel: 0)
var bind = data.assumingMemoryBound(to: UInt8.self)
var sum:UInt8 = 0;
for i in 0..<width*height {
sum += bind.pointee
bind.advanced(by: 1)
}
return sum != 0
}

Matthijs' change is necessary, but there are also a couple of other issues with the correctness of this method.
You're actually only iterating over 1/4 of the pixels, since you're stepping byte-wise and the upper bound of your loop is width * height rather than bytesPerRow * height.
Additionally, computing the sum of the pixels doesn't really seem like what you want. You can save some work by returning true as soon as you encounter a non-zero value (if bind.pointee != 0).
(Incidentally, Swift's integer overflow protection will actually raise an exception if you accumulate a value greater than 255 into a UInt8. I suppose you could use a bigger integer, or disable overflow checking with sum = sum &+ bind.pointee, but again, breaking the loop on the first non-clear pixel will save some time and prevent false positives when the accumulator "rolls over" to exactly 0.)
Here's a version of your function that worked for me:
func anythingHere(_ texture: MTLTexture) -> Bool {
let width = texture.width
let height = texture.height
let bytesPerRow = width * 4
let data = UnsafeMutableRawPointer.allocate(byteCount: bytesPerRow * height, alignment: 4)
defer {
data.deallocate()
}
let region = MTLRegionMake2D(0, 0, width, height)
texture.getBytes(data, bytesPerRow: bytesPerRow, from: region, mipmapLevel: 0)
var bind = data.assumingMemoryBound(to: UInt8.self)
for _ in 0..<bytesPerRow * height {
if bind.pointee != 0 {
return true
}
bind = bind.advanced(by: 1)
}
return false
}
Keep in mind that on macOS, the default storageMode for textures is managed, which means their contents aren't automatically synchronized back to main memory when they're modified on the GPU. You must explicitly use a blit command encoder to sync the contents yourself:
let syncEncoder = buffer.makeBlitCommandEncoder()!
syncEncoder.synchronize(resource: texture)
syncEncoder.endEncoding()

Didn't look in detail at the rest of the code, but I think this,
bind.advanced(by: 1)
should be:
bind = bind.advanced(by: 1)

Related

How to interpret the pixel array derived from CMSampleBuffer in Swift

Maybe this is a very stupid question.
I am using AVFoundation in my app and I am able to get the frames(32BGRA Format).
The width of the frame is 1504, Height is 1128 and the bytes-Per-Row value is 6016.
When I create a UInt8 pixel array from this samplebuffer the length (array.count) of this array is 1696512 which happens to be equal to width * height.
What I am not getting is why the array length is width * height. Should it not be width * height * 4.
What am I missing here?
Edit - 1: Code
func BufferToArray(sampleBuffer: CMSampleBuffer) -> ([UInt8], Int, Int, Int) {
var rgbBufferArray = [UInt8]()
//Get pixel Buffer from CMSSampleBUffer
let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)!
//Lock the base Address
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags.readOnly)
let width = CVPixelBufferGetWidth(pixelBuffer)
let height = CVPixelBufferGetHeight(pixelBuffer)
//get pixel count
let pixelCount = CVPixelBufferGetWidth(pixelBuffer) * CVPixelBufferGetHeight(pixelBuffer)
//Get base address
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
//Get bytes per row of the image
let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
//Cast the base address to UInt8. This is like an array now
let frameBuffer = baseAddress?.assumingMemoryBound(to: UInt8.self)
rgbBufferArray = Array(UnsafeMutableBufferPointer(start: frameBuffer, count: pixelCount))
//Unlock and release memory
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
return (rgbBufferArray, bytesPerRow, width, height)
}
The culprit is the data type (UInt8) in combination with the count:
You are assuming the memory contains UInt8 values (assumingMemoryBound(to: UInt8.self)) of pixelCount count. But as you concluded correctly it should be four times that number.
I'd recommend you import simd and use simd_uchar4 as data type. That's a struct type containing 4 UInt8. Then your array will contain pixelCount values of 4-tuple pixel values. You can access the channels with array[index].x , .y, .z, and .w respectively.

Traversing UnsafeMutableRawPointer image data

I seem to be unable to wrap my head around the methodology behind manually accessing image pixel data in Swift. I am attempting to create an image mask from a CGImage that can later be used on a separate image. I want to identify all pixels of a specific value and convert everything else in the image to black/white or maybe alpha (not really important at the moment however). The code I'm playing with looks like this:
let colorSpace: CGColorSpace = CGColorSpaceCreateDeviceRGB()
let contextWidth: Int = Int(snapshot.size.width)
let contextHeight: Int = Int(snapshot.size.height)
let bytesPerPixel: Int = 24
let bitsPerComponent: Int = 8
let bytesPerRow: Int = bytesPerPixel * contextWidth
let bitmapInfo: CGBitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.noneSkipLast.rawValue)
guard let context: CGContext = CGContext(data: nil, width: contextWidth, height: contextHeight, bitsPerComponent: bitsPerComponent, bytesPerRow: bytesPerRow, space: colorSpace, bitmapInfo: bitmapInfo.rawValue) else {
print("Could not create CGContext")
return
}
context.draw(maskCGImage, in: CGRect(x: 0, y: 0, width: contextWidth, height: contextHeight))
guard let contextDataRaw: UnsafeMutableRawPointer = context.data else {
print("Could not get UnsafeMutableRawPointer from CGContext")
return
}
let contextData: UnsafeMutablePointer<UInt8> = contextDataRaw.bindMemory(to: UInt8.self, capacity: contextWidth * contextHeight)
for row in 0..<contextHeight {
for col in 0..<contextWidth {
let offset = (col * contextHeight) + row
let pixelArray = [contextData[offset], contextData[offset + 1], contextData[offset + 2]]
if pixelArray == [120, 120, 120] {
contextData[offset] = 0
contextData[offset + 1] = 0
contextData[offset + 2] = 0
}
}
}
I have tried various arrangements of the rows and columns trying to identify the correct order, i.e. let offset = (row * contextWidth) + col, let offset = (col * contextHeight) + row, let offset = ((row * contextWidth) + col) * 3, let offset = ((row * contextWidth) + col) * 4.
The output I get looks something like this (Keep in mind that this image IS supposed to look like a blob of random colors):
As my fancy little arrow shows, the black swatch across the top is my edited pixels, and those pixels are indeed supposed to be turned black, however, so are all the other gray pixels (the ones under the arrow for example). The are definitely the same RGB value of 120, 120, 120.
I know the issue is in the order that I'm moving across the array, I just can't seem to figure out what the pattern is. Also, as a note, using copy(maskingColorComponents:) won't do because I want to remove a few specific colors, not a range of them.
Any help is greatly appreciated as always. Thanks in advance!
You're obviously on the right track because you've correctly hit all the pixels in the top left corner. But you don't keep going the rest of the way down the image; clearly you are not surveying enough rows. So the problem might be merely that you are slightly off in your idea of what a row is.
You are saying
for row in 0..<contextHeight {
for col in 0..<contextWidth {
let offset = (col * contextHeight) + row
as if adding row would in fact get you to that row. But row is just the number of the desired row, not the byte that starts that row; it seems to me that the size of one row jump needs to be the size of all the bytes in one row.

How to normalize disparity data in iOS?

In WWDC session "Image Editing with Depth" they mentioned few times normalizedDisparity and normalizedDisparityImage:
"The basic idea is that we're going to map our normalized disparity
values into values between 0 and 1"
"So once you know the min and max you can normalize the depth or disparity between 0 and 1."
I tried to first get the disparit image like this:
let disparityImage = depthImage.applyingFilter(
"CIDepthToDisparity", withInputParameters: nil)
Then I tried to get depthDataMap and do normalization but it didn't work. I'm I on the right track? would be appreciate some hint on what to do.
Edit:
This is my test code, sorry for the quality. I get the min and max then I try to loop over the data to normalize it (let normalizedPoint = (point - min) / (max - min))
let depthDataMap = depthData!.depthDataMap
let width = CVPixelBufferGetWidth(depthDataMap) //768 on an iPhone 7+
let height = CVPixelBufferGetHeight(depthDataMap) //576 on an iPhone 7+
CVPixelBufferLockBaseAddress(depthDataMap, CVPixelBufferLockFlags(rawValue: 0))
// Convert the base address to a safe pointer of the appropriate type
let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(depthDataMap),
to: UnsafeMutablePointer<Float32>.self)
var min = floatBuffer[0]
var max = floatBuffer[0]
for x in 0..<width{
for y in 0..<height{
let distanceAtXYPoint = floatBuffer[Int(x * y)]
if(distanceAtXYPoint < min){
min = distanceAtXYPoint
}
if(distanceAtXYPoint > max){
max = distanceAtXYPoint
}
}
}
What I expected is the the data will reflect the disparity where the user clicked on the image but it didn't match. The code to find the disparity where the user clicked is here:
// Apply the filter with the sampleRect from the user’s tap. Don’t forget to clamp!
let minMaxImage = normalized?.clampingToExtent().applyingFilter(
"CIAreaMinMaxRed", withInputParameters:
[kCIInputExtentKey : CIVector(cgRect:rect2)])
// A four-byte buffer to store a single pixel value
var pixel = [UInt8](repeating: 0, count: 4)
// Render the image to a 1x1 rect. Be sure to use a nil color space.
context.render(minMaxImage!, toBitmap: &pixel, rowBytes: 4,
bounds: CGRect(x:0, y:0, width:1, height:1),
format: kCIFormatRGBA8, colorSpace: nil)
// The max is stored in the green channel. Min is in the red.
let disparity = Float(pixel[1]) / 255.0
There's a new blog post on raywenderlich.com called "Image Depth Maps Tutorial for iOS" contains sample app and details related to working with depth. The sample code shows how to normalize the depth data using a CVPixelBuffer extension:
extension CVPixelBuffer {
func normalize() {
let width = CVPixelBufferGetWidth(self)
let height = CVPixelBufferGetHeight(self)
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(self), to: UnsafeMutablePointer<Float>.self)
var minPixel: Float = 1.0
var maxPixel: Float = 0.0
for y in 0 ..< height {
for x in 0 ..< width {
let pixel = floatBuffer[y * width + x]
minPixel = min(pixel, minPixel)
maxPixel = max(pixel, maxPixel)
}
}
let range = maxPixel - minPixel
for y in 0 ..< height {
for x in 0 ..< width {
let pixel = floatBuffer[y * width + x]
floatBuffer[y * width + x] = (pixel - minPixel) / range
}
}
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
}
}
Something to keep in mind when working with depth data that they are lower resolution than the actual image so you need to scale up (more info in the blog and in the WWDC video)
Will's answer above is very good, but it can be improved as follows. I'm using it with depth data from a photo, it's possible that if the depth data doesn't follow 16-bits, as mentioned above, it won't work. Haven't found such a photo yet. I'm surprised there isn't a filter to handle this in Core Image.
extension CVPixelBuffer {
func normalize() {
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
let width = CVPixelBufferGetWidthOfPlane(self, 0)
let height = CVPixelBufferGetHeightOfPlane(self, 0)
let count = width * height
let pixelBufferBase = unsafeBitCast(CVPixelBufferGetBaseAddressOfPlane(self, 0), to: UnsafeMutablePointer<Float>.self)
let depthCopyBuffer = UnsafeMutableBufferPointer<Float>(start: pixelBufferBase, count: count)
let maxValue = vDSP.maximum(depthCopyBuffer)
let minValue = vDSP.minimum(depthCopyBuffer)
let range = maxValue - minValue
let negMinValue = -minValue
let subtractVector = vDSP.add(negMinValue, depthCopyBuffer)
let normalizedDisparity = vDSP.divide(subtractVector, range)
pixelBufferBase.initialize(from: normalizedDisparity, count: count)
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
}
}
Try using the Accelerate Framework vDSP vector functions.. here is a normalize in two functions.
to change the cvPixel buffer to a 0..1 normalized range
myCVPixelBuffer.setUpNormalize()
import Accelerate
extension CVPixelBuffer {
func vectorNormalize( targetVector: UnsafeMutableBufferPointer<Float>) -> [Float] {
// range = max - min
// normalized to 0..1 is (pixel - minPixel) / range
// see Documentation "Using vDSP for Vector-based Arithmetic" in vDSP under system "Accelerate" documentation
// see also the Accelerate documentation section 'Vector extrema calculation'
// Maximium static func maximum<U>(U) -> Float
// Returns the maximum element of a single-precision vector.
//static func minimum<U>(U) -> Float
// Returns the minimum element of a single-precision vector.
let maxValue = vDSP.maximum(targetVector)
let minValue = vDSP.minimum(targetVector)
let range = maxValue - minValue
let negMinValue = -minValue
let subtractVector = vDSP.add(negMinValue, targetVector)
// adding negative value is subtracting
let result = vDSP.divide(subtractVector, range)
return result
}
func setUpNormalize() -> CVPixelBuffer {
// grayscale buffer float32 ie Float
// return normalized CVPixelBuffer
CVPixelBufferLockBaseAddress(self,
CVPixelBufferLockFlags(rawValue: 0))
let width = CVPixelBufferGetWidthOfPlane(self, 0)
let height = CVPixelBufferGetHeightOfPlane(self, 0)
let count = width * height
let bufferBaseAddress = CVPixelBufferGetBaseAddressOfPlane(self, 0)
// UnsafeMutableRawPointer
let pixelBufferBase = unsafeBitCast(bufferBaseAddress, to: UnsafeMutablePointer<Float>.self)
let depthCopy = UnsafeMutablePointer<Float>.allocate(capacity: count)
depthCopy.initialize(from: pixelBufferBase, count: count)
let depthCopyBuffer = UnsafeMutableBufferPointer<Float>(start: depthCopy, count: count)
let normalizedDisparity = vectorNormalize(targetVector: depthCopyBuffer)
pixelBufferBase.initialize(from: normalizedDisparity, count: count)
// copy back the normalized map into the CVPixelBuffer
depthCopy.deallocate()
// depthCopyBuffer.deallocate()
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
return self
}
}
You can see it in action in a modified version of the Apple sample 'PhotoBrowse' app at
https://github.com/racewalkWill/PhotoBrowseModified

Optimize retrieval of all rendered pixel's data in a UIView

I need to perform some statistics and pixel-by-pixel analysis of a UIView containing sub views, sublayers and mask in a small iOS-swift3 project.
For the moment i came up with the following:
private func computeStatistics() {
// constants
let width: Int = Int(self.bounds.size.width)
let height: Int = Int(self.bounds.size.height)
// color extractor
let pixel = UnsafeMutablePointer<CUnsignedChar>.allocate(capacity: 4)
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bitmapInfo = CGBitmapInfo(rawValue: CGImageAlphaInfo.premultipliedLast.rawValue)
for x in 0..<width {
for y in 0..<height {
let context = CGContext(data: pixel, width: 1, height: 1, bitsPerComponent: 8, bytesPerRow: 4, space: colorSpace, bitmapInfo: bitmapInfo.rawValue)
context!.translateBy(x: -CGFloat(x), y: -CGFloat(y))
layer.render(in: context!)
// analyse the pixel here
// eg: let totalRed += pixel[0]
}
}
pixel.deallocate(capacity: 4)
}
It's working, the problem is that on a fullscreen view even on an iphone4 this would mean 150.000 instantiations of the context and as many expensive renders, that beside being very slow must also have an issue with deallocation, saturating my memory (even in simulator).
I tried analysis only a fraction of the pixels
let definition: Int = width / 10
for x in 0..<width where x%definition == 0 {
...
}
But beside still taking up to 10 seconds on even on a simulated iphone7 is a very poor solution.
Is it possible to avoid re-rendering and translating the context everytime?

Get pixel value from CVPixelBufferRef in Swift

How can I get the RGB (or any other format) pixel value from a CVPixelBufferRef? Ive tried many approaches but no success yet.
func captureOutput(captureOutput: AVCaptureOutput!,
didOutputSampleBuffer sampleBuffer: CMSampleBuffer!,
fromConnection connection: AVCaptureConnection!) {
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
//Get individual pixel values here
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
}
baseAddress is an unsafe mutable pointer or more precisely a UnsafeMutablePointer<Void>. You can easily access the memory once you have converted the pointer away from Void to a more specific type:
// Convert the base address to a safe pointer of the appropriate type
let byteBuffer = UnsafeMutablePointer<UInt8>(baseAddress)
// read the data (returns value of type UInt8)
let firstByte = byteBuffer[0]
// write data
byteBuffer[3] = 90
Make sure you use the correct type (8, 16 or 32 bit unsigned int). It depends on the video format. Most likely it's 8 bit.
Update on buffer formats:
You can specify the format when you initialize the AVCaptureVideoDataOutput instance. You basically have the choice of:
BGRA: a single plane where the blue, green, red and alpha values are stored in a 32 bit integer each
420YpCbCr8BiPlanarFullRange: Two planes, the first containing a byte for each pixel with the Y (luma) value, the second containing the Cb and Cr (chroma) values for groups of pixels
420YpCbCr8BiPlanarVideoRange: The same as 420YpCbCr8BiPlanarFullRange but the Y values are restricted to the range 16 – 235 (for historical reasons)
If you're interested in the color values and speed (or rather maximum frame rate) is not an issue, then go for the simpler BGRA format. Otherwise take one of the more efficient native video formats.
If you have two planes, you must get the base address of the desired plane (see video format example):
Video format example
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
let bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
let byteBuffer = UnsafeMutablePointer<UInt8>(baseAddress)
// Get luma value for pixel (43, 17)
let luma = byteBuffer[17 * bytesPerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
BGRA example
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
let int32PerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let int32Buffer = UnsafeMutablePointer<UInt32>(baseAddress)
// Get BGRA value for pixel (43, 17)
let luma = int32Buffer[17 * int32PerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
Here is a method for getting the individual rgb values from a BGRA pixel buffer. Note: Your buffer must be locked before calling this.
func pixelFrom(x: Int, y: Int, movieFrame: CVPixelBuffer) -> (UInt8, UInt8, UInt8) {
let baseAddress = CVPixelBufferGetBaseAddress(movieFrame)
let bytesPerRow = CVPixelBufferGetBytesPerRow(movieFrame)
let buffer = baseAddress!.assumingMemoryBound(to: UInt8.self)
let index = x*4 + y*bytesPerRow
let b = buffer[index]
let g = buffer[index+1]
let r = buffer[index+2]
return (r, g, b)
}
Update for Swift3:
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0));
let int32Buffer = unsafeBitCast(CVPixelBufferGetBaseAddress(pixelBuffer), to: UnsafeMutablePointer<UInt32>.self)
let int32PerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
// Get BGRA value for pixel (43, 17)
let luma = int32Buffer[17 * int32PerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
Swift 5
I had the same problem and ended up with the following solution. My CVPixelBuffer had dimensionality 68 x 68, which can be inspected by
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
print(CVPixelBufferGetWidth(pixelBuffer))
print(CVPixelBufferGetHeight(pixelBuffer))
You also have to know the bytes per row:
print(CVPixelBufferGetBytesPerRow(pixelBuffer))
which in my case was 320.
Furthermore, you need to know the data type of your pixel buffer, which was Float32 for me.
I then constructed a byte buffer and read the bytes consecutively as follows (remember to lock the base address as shown above):
var byteBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(pixelBuffer), to: UnsafeMutablePointer<Float32>.self)
var pixelArray: Array<Array<Float>> = Array(repeating: Array(repeating: 0, count: 68), count: 68)
for row in 0...67{
for col in 0...67{
pixelArray[row][col] = byteBuffer.pointee
byteBuffer = byteBuffer.successor()
}
byteBuffer = byteBuffer.advanced(by: 12)
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
You might wonder about the part byteBuffer = byteBuffer.advanced(by: 12). The reason why we have to do this is as follows.
We know that we have 320 bytes per row. However, our buffer has width 68 and the data type is Float32, e.g. 4 bytes per value. That means that we virtually only have 272 bytes per row, followed by zero-padding. This zero-padding probably has memory layout reasons.
We, therefore, have to skip the last 48 bytes in each row which is done by byteBuffer = byteBuffer.advanced(by: 12) (12*4 = 48).
This approach is somewhat different from other solutions as we use pointers to the next byteBuffer. However, I find this easier and more intuitive.

Resources