How can I get the RGB (or any other format) pixel value from a CVPixelBufferRef? Ive tried many approaches but no success yet.
func captureOutput(captureOutput: AVCaptureOutput!,
didOutputSampleBuffer sampleBuffer: CMSampleBuffer!,
fromConnection connection: AVCaptureConnection!) {
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
//Get individual pixel values here
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
}
baseAddress is an unsafe mutable pointer or more precisely a UnsafeMutablePointer<Void>. You can easily access the memory once you have converted the pointer away from Void to a more specific type:
// Convert the base address to a safe pointer of the appropriate type
let byteBuffer = UnsafeMutablePointer<UInt8>(baseAddress)
// read the data (returns value of type UInt8)
let firstByte = byteBuffer[0]
// write data
byteBuffer[3] = 90
Make sure you use the correct type (8, 16 or 32 bit unsigned int). It depends on the video format. Most likely it's 8 bit.
Update on buffer formats:
You can specify the format when you initialize the AVCaptureVideoDataOutput instance. You basically have the choice of:
BGRA: a single plane where the blue, green, red and alpha values are stored in a 32 bit integer each
420YpCbCr8BiPlanarFullRange: Two planes, the first containing a byte for each pixel with the Y (luma) value, the second containing the Cb and Cr (chroma) values for groups of pixels
420YpCbCr8BiPlanarVideoRange: The same as 420YpCbCr8BiPlanarFullRange but the Y values are restricted to the range 16 – 235 (for historical reasons)
If you're interested in the color values and speed (or rather maximum frame rate) is not an issue, then go for the simpler BGRA format. Otherwise take one of the more efficient native video formats.
If you have two planes, you must get the base address of the desired plane (see video format example):
Video format example
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddressOfPlane(pixelBuffer, 0)
let bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(pixelBuffer, 0)
let byteBuffer = UnsafeMutablePointer<UInt8>(baseAddress)
// Get luma value for pixel (43, 17)
let luma = byteBuffer[17 * bytesPerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
BGRA example
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, 0)
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
let int32PerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
let int32Buffer = UnsafeMutablePointer<UInt32>(baseAddress)
// Get BGRA value for pixel (43, 17)
let luma = int32Buffer[17 * int32PerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
Here is a method for getting the individual rgb values from a BGRA pixel buffer. Note: Your buffer must be locked before calling this.
func pixelFrom(x: Int, y: Int, movieFrame: CVPixelBuffer) -> (UInt8, UInt8, UInt8) {
let baseAddress = CVPixelBufferGetBaseAddress(movieFrame)
let bytesPerRow = CVPixelBufferGetBytesPerRow(movieFrame)
let buffer = baseAddress!.assumingMemoryBound(to: UInt8.self)
let index = x*4 + y*bytesPerRow
let b = buffer[index]
let g = buffer[index+1]
let r = buffer[index+2]
return (r, g, b)
}
Update for Swift3:
let pixelBuffer: CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0));
let int32Buffer = unsafeBitCast(CVPixelBufferGetBaseAddress(pixelBuffer), to: UnsafeMutablePointer<UInt32>.self)
let int32PerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
// Get BGRA value for pixel (43, 17)
let luma = int32Buffer[17 * int32PerRow + 43]
CVPixelBufferUnlockBaseAddress(pixelBuffer, 0)
Swift 5
I had the same problem and ended up with the following solution. My CVPixelBuffer had dimensionality 68 x 68, which can be inspected by
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
print(CVPixelBufferGetWidth(pixelBuffer))
print(CVPixelBufferGetHeight(pixelBuffer))
You also have to know the bytes per row:
print(CVPixelBufferGetBytesPerRow(pixelBuffer))
which in my case was 320.
Furthermore, you need to know the data type of your pixel buffer, which was Float32 for me.
I then constructed a byte buffer and read the bytes consecutively as follows (remember to lock the base address as shown above):
var byteBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(pixelBuffer), to: UnsafeMutablePointer<Float32>.self)
var pixelArray: Array<Array<Float>> = Array(repeating: Array(repeating: 0, count: 68), count: 68)
for row in 0...67{
for col in 0...67{
pixelArray[row][col] = byteBuffer.pointee
byteBuffer = byteBuffer.successor()
}
byteBuffer = byteBuffer.advanced(by: 12)
}
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
You might wonder about the part byteBuffer = byteBuffer.advanced(by: 12). The reason why we have to do this is as follows.
We know that we have 320 bytes per row. However, our buffer has width 68 and the data type is Float32, e.g. 4 bytes per value. That means that we virtually only have 272 bytes per row, followed by zero-padding. This zero-padding probably has memory layout reasons.
We, therefore, have to skip the last 48 bytes in each row which is done by byteBuffer = byteBuffer.advanced(by: 12) (12*4 = 48).
This approach is somewhat different from other solutions as we use pointers to the next byteBuffer. However, I find this easier and more intuitive.
Related
Maybe this is a very stupid question.
I am using AVFoundation in my app and I am able to get the frames(32BGRA Format).
The width of the frame is 1504, Height is 1128 and the bytes-Per-Row value is 6016.
When I create a UInt8 pixel array from this samplebuffer the length (array.count) of this array is 1696512 which happens to be equal to width * height.
What I am not getting is why the array length is width * height. Should it not be width * height * 4.
What am I missing here?
Edit - 1: Code
func BufferToArray(sampleBuffer: CMSampleBuffer) -> ([UInt8], Int, Int, Int) {
var rgbBufferArray = [UInt8]()
//Get pixel Buffer from CMSSampleBUffer
let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)!
//Lock the base Address
CVPixelBufferLockBaseAddress(pixelBuffer, CVPixelBufferLockFlags.readOnly)
let width = CVPixelBufferGetWidth(pixelBuffer)
let height = CVPixelBufferGetHeight(pixelBuffer)
//get pixel count
let pixelCount = CVPixelBufferGetWidth(pixelBuffer) * CVPixelBufferGetHeight(pixelBuffer)
//Get base address
let baseAddress = CVPixelBufferGetBaseAddress(pixelBuffer)
//Get bytes per row of the image
let bytesPerRow = CVPixelBufferGetBytesPerRow(pixelBuffer)
//Cast the base address to UInt8. This is like an array now
let frameBuffer = baseAddress?.assumingMemoryBound(to: UInt8.self)
rgbBufferArray = Array(UnsafeMutableBufferPointer(start: frameBuffer, count: pixelCount))
//Unlock and release memory
CVPixelBufferUnlockBaseAddress(pixelBuffer, CVPixelBufferLockFlags(rawValue: 0))
return (rgbBufferArray, bytesPerRow, width, height)
}
The culprit is the data type (UInt8) in combination with the count:
You are assuming the memory contains UInt8 values (assumingMemoryBound(to: UInt8.self)) of pixelCount count. But as you concluded correctly it should be four times that number.
I'd recommend you import simd and use simd_uchar4 as data type. That's a struct type containing 4 UInt8. Then your array will contain pixelCount values of 4-tuple pixel values. You can access the channels with array[index].x , .y, .z, and .w respectively.
I am trying to determine if a MTLTexture (in bgra8Unorm format) is blank by calculating the sum of all the R G B and A components of each of its pixels.
This function intends to do this by adding adjacent floats in memory after a texture has been copied to a pointer. However I have determined that this function ends up returning false nomatter the MTLTexture given.
What is wrong with this function?
func anythingHere(_ texture: MTLTexture) -> Bool {
let width = texture.width
let height = texture.height
let bytesPerRow = width * 4
let data = UnsafeMutableRawPointer.allocate(bytes: bytesPerRow * height, alignedTo: 4)
defer {
data.deallocate(bytes: bytesPerRow * height, alignedTo: 4)
}
let region = MTLRegionMake2D(0, 0, width, height)
texture.getBytes(data, bytesPerRow: bytesPerRow, from: region, mipmapLevel: 0)
var bind = data.assumingMemoryBound(to: UInt8.self)
var sum:UInt8 = 0;
for i in 0..<width*height {
sum += bind.pointee
bind.advanced(by: 1)
}
return sum != 0
}
Matthijs' change is necessary, but there are also a couple of other issues with the correctness of this method.
You're actually only iterating over 1/4 of the pixels, since you're stepping byte-wise and the upper bound of your loop is width * height rather than bytesPerRow * height.
Additionally, computing the sum of the pixels doesn't really seem like what you want. You can save some work by returning true as soon as you encounter a non-zero value (if bind.pointee != 0).
(Incidentally, Swift's integer overflow protection will actually raise an exception if you accumulate a value greater than 255 into a UInt8. I suppose you could use a bigger integer, or disable overflow checking with sum = sum &+ bind.pointee, but again, breaking the loop on the first non-clear pixel will save some time and prevent false positives when the accumulator "rolls over" to exactly 0.)
Here's a version of your function that worked for me:
func anythingHere(_ texture: MTLTexture) -> Bool {
let width = texture.width
let height = texture.height
let bytesPerRow = width * 4
let data = UnsafeMutableRawPointer.allocate(byteCount: bytesPerRow * height, alignment: 4)
defer {
data.deallocate()
}
let region = MTLRegionMake2D(0, 0, width, height)
texture.getBytes(data, bytesPerRow: bytesPerRow, from: region, mipmapLevel: 0)
var bind = data.assumingMemoryBound(to: UInt8.self)
for _ in 0..<bytesPerRow * height {
if bind.pointee != 0 {
return true
}
bind = bind.advanced(by: 1)
}
return false
}
Keep in mind that on macOS, the default storageMode for textures is managed, which means their contents aren't automatically synchronized back to main memory when they're modified on the GPU. You must explicitly use a blit command encoder to sync the contents yourself:
let syncEncoder = buffer.makeBlitCommandEncoder()!
syncEncoder.synchronize(resource: texture)
syncEncoder.endEncoding()
Didn't look in detail at the rest of the code, but I think this,
bind.advanced(by: 1)
should be:
bind = bind.advanced(by: 1)
In WWDC session "Image Editing with Depth" they mentioned few times normalizedDisparity and normalizedDisparityImage:
"The basic idea is that we're going to map our normalized disparity
values into values between 0 and 1"
"So once you know the min and max you can normalize the depth or disparity between 0 and 1."
I tried to first get the disparit image like this:
let disparityImage = depthImage.applyingFilter(
"CIDepthToDisparity", withInputParameters: nil)
Then I tried to get depthDataMap and do normalization but it didn't work. I'm I on the right track? would be appreciate some hint on what to do.
Edit:
This is my test code, sorry for the quality. I get the min and max then I try to loop over the data to normalize it (let normalizedPoint = (point - min) / (max - min))
let depthDataMap = depthData!.depthDataMap
let width = CVPixelBufferGetWidth(depthDataMap) //768 on an iPhone 7+
let height = CVPixelBufferGetHeight(depthDataMap) //576 on an iPhone 7+
CVPixelBufferLockBaseAddress(depthDataMap, CVPixelBufferLockFlags(rawValue: 0))
// Convert the base address to a safe pointer of the appropriate type
let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(depthDataMap),
to: UnsafeMutablePointer<Float32>.self)
var min = floatBuffer[0]
var max = floatBuffer[0]
for x in 0..<width{
for y in 0..<height{
let distanceAtXYPoint = floatBuffer[Int(x * y)]
if(distanceAtXYPoint < min){
min = distanceAtXYPoint
}
if(distanceAtXYPoint > max){
max = distanceAtXYPoint
}
}
}
What I expected is the the data will reflect the disparity where the user clicked on the image but it didn't match. The code to find the disparity where the user clicked is here:
// Apply the filter with the sampleRect from the user’s tap. Don’t forget to clamp!
let minMaxImage = normalized?.clampingToExtent().applyingFilter(
"CIAreaMinMaxRed", withInputParameters:
[kCIInputExtentKey : CIVector(cgRect:rect2)])
// A four-byte buffer to store a single pixel value
var pixel = [UInt8](repeating: 0, count: 4)
// Render the image to a 1x1 rect. Be sure to use a nil color space.
context.render(minMaxImage!, toBitmap: &pixel, rowBytes: 4,
bounds: CGRect(x:0, y:0, width:1, height:1),
format: kCIFormatRGBA8, colorSpace: nil)
// The max is stored in the green channel. Min is in the red.
let disparity = Float(pixel[1]) / 255.0
There's a new blog post on raywenderlich.com called "Image Depth Maps Tutorial for iOS" contains sample app and details related to working with depth. The sample code shows how to normalize the depth data using a CVPixelBuffer extension:
extension CVPixelBuffer {
func normalize() {
let width = CVPixelBufferGetWidth(self)
let height = CVPixelBufferGetHeight(self)
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
let floatBuffer = unsafeBitCast(CVPixelBufferGetBaseAddress(self), to: UnsafeMutablePointer<Float>.self)
var minPixel: Float = 1.0
var maxPixel: Float = 0.0
for y in 0 ..< height {
for x in 0 ..< width {
let pixel = floatBuffer[y * width + x]
minPixel = min(pixel, minPixel)
maxPixel = max(pixel, maxPixel)
}
}
let range = maxPixel - minPixel
for y in 0 ..< height {
for x in 0 ..< width {
let pixel = floatBuffer[y * width + x]
floatBuffer[y * width + x] = (pixel - minPixel) / range
}
}
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
}
}
Something to keep in mind when working with depth data that they are lower resolution than the actual image so you need to scale up (more info in the blog and in the WWDC video)
Will's answer above is very good, but it can be improved as follows. I'm using it with depth data from a photo, it's possible that if the depth data doesn't follow 16-bits, as mentioned above, it won't work. Haven't found such a photo yet. I'm surprised there isn't a filter to handle this in Core Image.
extension CVPixelBuffer {
func normalize() {
CVPixelBufferLockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
let width = CVPixelBufferGetWidthOfPlane(self, 0)
let height = CVPixelBufferGetHeightOfPlane(self, 0)
let count = width * height
let pixelBufferBase = unsafeBitCast(CVPixelBufferGetBaseAddressOfPlane(self, 0), to: UnsafeMutablePointer<Float>.self)
let depthCopyBuffer = UnsafeMutableBufferPointer<Float>(start: pixelBufferBase, count: count)
let maxValue = vDSP.maximum(depthCopyBuffer)
let minValue = vDSP.minimum(depthCopyBuffer)
let range = maxValue - minValue
let negMinValue = -minValue
let subtractVector = vDSP.add(negMinValue, depthCopyBuffer)
let normalizedDisparity = vDSP.divide(subtractVector, range)
pixelBufferBase.initialize(from: normalizedDisparity, count: count)
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
}
}
Try using the Accelerate Framework vDSP vector functions.. here is a normalize in two functions.
to change the cvPixel buffer to a 0..1 normalized range
myCVPixelBuffer.setUpNormalize()
import Accelerate
extension CVPixelBuffer {
func vectorNormalize( targetVector: UnsafeMutableBufferPointer<Float>) -> [Float] {
// range = max - min
// normalized to 0..1 is (pixel - minPixel) / range
// see Documentation "Using vDSP for Vector-based Arithmetic" in vDSP under system "Accelerate" documentation
// see also the Accelerate documentation section 'Vector extrema calculation'
// Maximium static func maximum<U>(U) -> Float
// Returns the maximum element of a single-precision vector.
//static func minimum<U>(U) -> Float
// Returns the minimum element of a single-precision vector.
let maxValue = vDSP.maximum(targetVector)
let minValue = vDSP.minimum(targetVector)
let range = maxValue - minValue
let negMinValue = -minValue
let subtractVector = vDSP.add(negMinValue, targetVector)
// adding negative value is subtracting
let result = vDSP.divide(subtractVector, range)
return result
}
func setUpNormalize() -> CVPixelBuffer {
// grayscale buffer float32 ie Float
// return normalized CVPixelBuffer
CVPixelBufferLockBaseAddress(self,
CVPixelBufferLockFlags(rawValue: 0))
let width = CVPixelBufferGetWidthOfPlane(self, 0)
let height = CVPixelBufferGetHeightOfPlane(self, 0)
let count = width * height
let bufferBaseAddress = CVPixelBufferGetBaseAddressOfPlane(self, 0)
// UnsafeMutableRawPointer
let pixelBufferBase = unsafeBitCast(bufferBaseAddress, to: UnsafeMutablePointer<Float>.self)
let depthCopy = UnsafeMutablePointer<Float>.allocate(capacity: count)
depthCopy.initialize(from: pixelBufferBase, count: count)
let depthCopyBuffer = UnsafeMutableBufferPointer<Float>(start: depthCopy, count: count)
let normalizedDisparity = vectorNormalize(targetVector: depthCopyBuffer)
pixelBufferBase.initialize(from: normalizedDisparity, count: count)
// copy back the normalized map into the CVPixelBuffer
depthCopy.deallocate()
// depthCopyBuffer.deallocate()
CVPixelBufferUnlockBaseAddress(self, CVPixelBufferLockFlags(rawValue: 0))
return self
}
}
You can see it in action in a modified version of the Apple sample 'PhotoBrowse' app at
https://github.com/racewalkWill/PhotoBrowseModified
I am attempting to calculate a histogram for the Y channel in a kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange image buffer. When I use vImageHistogramCalculation_Planar8 I pass in a reference to only a single histogram.
How do I know which channel is being used to create the histogram? What would I do if I wanted to read all channels?
Also open to critiques of the code sample.
extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ captureOutput: AVCaptureOutput!,
didOutputSampleBuffer sampleBuffer: CMSampleBuffer!,
from connection: AVCaptureConnection!) {
let imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)!
CVPixelBufferLockBaseAddress(imageBuffer, CVPixelBufferLockFlags(rawValue: 0))
let height = CVPixelBufferGetHeight(imageBuffer)
let width = CVPixelBufferGetWidth(imageBuffer)
let bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer)
let pixelBuffer = CVPixelBufferGetBaseAddress(imageBuffer)
// let format = CVPixelBufferGetPixelFormatType(imageBuffer)
// print("format: \(format)")
///kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange = '420v'
var vBuffer = vImage_Buffer()
vBuffer.data = pixelBuffer
vBuffer.rowBytes = bytesPerRow
vBuffer.width = vImagePixelCount(width)
vBuffer.height = vImagePixelCount(height)
let luma = [UInt](repeating: 0, count: 256)
let lumaHist = UnsafeMutablePointer<vImagePixelCount>(mutating: luma)
vImageHistogramCalculation_Planar8(&vBuffer, lumaHist, UInt32(kvImageNoFlags))
CVPixelBufferUnlockBaseAddress(imageBuffer, CVPixelBufferLockFlags(rawValue: 0))
}
}
The kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange is a planar format with all the planes encoded into the buffer. And vImage planar functions only work on one plane at a time.
The above code is computing an histogram on the three planes but treated as one big plane which is probably not what you want.
It is possible to access the base address and the number of bytes per row for the Y plane with these functions:
let bytesPerRow = CVPixelBufferGetBytesPerRowOfPlane(imageBuffer, 0)
let pixelBuffer = CVPixelBufferGetBaseAddressOfPlane(imageBuffer, 0)
The plane index depends of the buffer format. The name usually gives you a hint. Here it's YpCbCr so the Y plane should be the first one, at index 0.
According to the header, CVPixelBufferGetBaseAddress will return:
For chunky buffers, this will return a pointer to the pixel at
0,0 in the buffer.
For planar buffers this will return a pointer to a PlanarComponentInfo struct
(defined in QuickTime).
So, if true, it is not computing the histogram of all three channels at once. It is computing the even less useful histogram of the PlanarComponentInfo struct and possibly crashing.
To read all the channels, you can get the second plane out using the interfaces described in Sparga's answer above (CVPixelBufferGetBytesPerRowOfPlane(imageBuffer,1) and CVPixelBufferGetBaseAddressOfPlane(imageBuffer,1)), and do a ARGB histogram of the half width chroma image and add the even histograms together and the odd histograms together. Note that because this is 420, the height and width of the chroma plane is not the same as the luminance plane.
I would also file a bug report with apple asking for vImageHistogramCalculation_RG88 to deal with biplanar chroma data.
I'm trying to get the per-pixel RGBA values for a CIImage in floating point.
I expect the following to work, using CIContext and rendering as kCIFormatRGBAh, but the output is all zeroes. Otherwise my next step would be converting from half floats to full.
What am I doing wrong? I've also tried this in Objective-C and get the same result.
let image = UIImage(named: "test")!
let sourceImage = CIImage(CGImage: image.CGImage)
let context = CIContext(options: [kCIContextWorkingColorSpace: NSNull()])
let colorSpace = CGColorSpaceCreateDeviceRGB()
let bounds = sourceImage.extent()
let bytesPerPixel: UInt = 8
let format = kCIFormatRGBAh
let rowBytes = Int(bytesPerPixel * UInt(bounds.size.width))
let totalBytes = UInt(rowBytes * Int(bounds.size.height))
var bitmap = calloc(totalBytes, UInt(sizeof(UInt8)))
context.render(sourceImage, toBitmap: bitmap, rowBytes: rowBytes, bounds: bounds, format: format, colorSpace: colorSpace)
let bytes = UnsafeBufferPointer<UInt8>(start: UnsafePointer<UInt8>(bitmap), count: Int(totalBytes))
for (var i = 0; i < Int(totalBytes); i += 2) {
println("half float :: left: \(bytes[i]) / right: \(bytes[i + 1])")
// prints all zeroes!
}
free(bitmap)
Here's a related question about getting the output of CIAreaHistogram, which is why I want floating point values rather than integer, but I can't seem to make kCIFormatRGBAh work on any CIImage regardless of its origin, filter output or otherwise.
There are two constraints on using RGBAh with [CIContext render:toBitmap:rowBytes:bounds:format:colorSpace:] on iOS
the rowBytes must be a multiple of 8 bytes
calling it under simulator is not supported
These constraints come from the behavior of OpenGLES with RGBAh on iOS.