OPEN3D: How to convert CVPixelBuffer to o3d.geometry.Image - ios

I'd like to use the o3d.PointCloud.create_from_depth_image function to convert a depth image into point cloud.
Open3D docs say the following: An Open3D Image can be directly converted to/from a numpy array.
I have a CVPixelBuffer coming from camera.
How to create an o3d.geometry.Image from pixel array without saving it to disk first?
here's my code:
guard let cameraCalibrationData = frame.cameraCalibrationData else { return }
let frameIntrinsics = cameraCalibrationData.intrinsicMatrix
let referenceDimensions = cameraCalibrationData.intrinsicMatrixReferenceDimensions
let width = Float(referenceDimensions.width)
let height = Float(referenceDimensions.height)
let fx = frameIntrinsics.columns.0[0]
let fy = frameIntrinsics.columns.0[1]
let cx = frameIntrinsics.columns.2[0]
let cy = frameIntrinsics.columns.2[1]
let intrinsics = self.o3d.camera.PinholeCameraIntrinsic()
intrinsics.set_intrinsics(width, height, fx, fy, cx, cy)
//QUESTION HERE:
//how to convert CVPixelBuffer depth to o3d geometry IMAGE ?
let depth : CVPixelBuffer = frame.depthDataMap
let depthImage = self.o3d.geometry.Image()
let cloud = self.o3d.geometry.PointCloud.create_from_depth_image(depthImage, intrinsics)
print(cloud)

Related

Not able to detect Aruco markers in iOS Swift

I am working on an OpenCV project where I need to detect Aruco Marker and I am not using the default OpenCV camera view I have created a native camera that sends data in CMSampleBuffer type
// coverts CMSampleBuffer to Mat object
private func processBuffer(buffer: CMSampleBuffer) {
guard let imgBuf = CMSampleBufferGetImageBuffer(buffer) else { return }
// lock the buffer
CVPixelBufferLockBaseAddress(imgBuf, [])
// get image properties
let width = CVPixelBufferGetWidth(imgBuf)
let height = CVPixelBufferGetHeight(imgBuf)
// create the Mat
let imageMat: Mat = Mat(rows: Int32(height), cols: Int32(width), type: CvType.CV_8UC4)
// unlock again
CVPixelBufferUnlockBaseAddress(imgBuf, [])
self.processImage(imageMat)
}
private func processImage(_ image: Mat) {
let img = Mat()
let dst = Mat()
image.convert(to: img, rtype: CvType.CV_8UC3)
Imgproc.cvtColor(src: image, dst: dst, code: ColorConversionCodes.COLOR_RGB2GRAY)
let parameters = DetectorParameters.create()
// a method that sets basic details to the dictionary like adaptiveThreshWinSizeMin, adaptiveThreshWinSizeMax, adaptiveThreshWinSizeMax, adaptiveThreshConstant etc
updateDetectorParameter(parameters)
var corners: [Mat] = []
var ids = MatOfInt()
let dictionary = Aruco.getPredefinedDictionary(dict: 8)
Aruco.detectMarkers(
image: dst,
dictionary: dictionary,
corners: &corners,
ids: ids,
parameters: parameters
)
}
Now after the Aruco.detectMarkers method when I am trying to detect any Aruco Marker, it's not detecting still ids and corners are empty.

Core image filter with custom metal kernel doesn't work

I've made a custom CIFilter based on a custom kernel, I can't make it work the output image is filled with black and I can't understand why.
Here is the shader:
// MARK: Custom kernels
float4 eight_bit(sampler image, sampler palette_image, float paletteSize) {
float4 color = image.sample(image.coord());
float dist = distance(color, palette_image.sample(float2(0,0)));
float4 returnColor = palette_image.sample(float2(0,0));
for (int i = 1; i < floor(paletteSize); ++i) {
float tempDist = distance(color, palette_image.sample(float2(i,0)));
if (tempDist < dist) {
dist = tempDist;
returnColor = palette_image.sample(float2(i,0));
}
}
return returnColor;
}
The first sampler is the image that needs to be elaborated the second image is and image that contains the colors of a specific palette that must be used in that image.
The palette image is create from an array of RGBA values, passed to a Data buffer an created by using this CIImage initializer init(bitmapData data: Data, bytesPerRow: Int, size: CGSize, format: CIFormat, colorSpace: CGColorSpace?). The image is 1px in height and number of color wide. The image is obtained correctly and it looks like that:
Trying to inspect the shader I've found:
If I return color I get the original image, thus means that the sampler image is passed correctly
If I try to return a color from any pixel in palette_image the resulting image from the filter is black
I'm starting to think that the palette_image is somehow not passed correctly. Here how the image is passed through the filter:
override var outputImage: CIImage? {
guard let inputImage = inputImage else
{
return nil
}
let palette = EightBitColorFilter.palettes[Int(0)]
let paletteImage = EightBitColorFilter.image(from: palette)
let extent = inputImage.extent
let pixellateImage = inputImage.applyingFilter("CIPixellate", parameters: [kCIInputScaleKey: inputScale])
// let sampler = CISampler(image: paletteImage)
let arguments = [pixellateImage, paletteImage, Float(palette.count)] as [Any]
let final = kernel.apply(extent: extent, roiCallback: {
(index, rect) in
return rect
}, arguments: arguments)
return final
}
Your sampling coordinates are off.
Samplers use relative coordinates in Core Image, i.e. (0,0) corresponds to the upper left corner, (1,1) the lower right corner of the whole input image.
So try something like this:
float4 eight_bit(sampler image, sampler palette_image, float paletteSize) {
float4 color = image.sample(image.coord());
// initial offset to land in the middle of the first pixel
float2 firstPaletteCoord = float2(1.0 / (2.0 * palletSize), 0.5);
float dist = distance(color, palette_image.sample(firstPaletteCoord));
float4 returnColor = palette_image.sample(firstPaletteCoord);
for (int i = 1; i < floor(paletteSize); ++i) {
// step one pixel further
float2 paletteCoord = firstPaletteCoord + float2(1.0 / paletteSize, 0.0);
float4 paletteColor = palette_image.sample(paletteCoord);
float tempDist = distance(color, paletteColor);
if (tempDist < dist) {
dist = tempDist;
returnColor = paletteColor;
}
}
return returnColor;
}

How to read depth data at a CGPoint from AVDepthData buffer

I am attempting to find the depth data at a certain point in the captured image and return the distance in meters.
I have enabled depth data and am capturing the data alongside the image. I get the point from the X,Y coordinates of the center of the image (and when pressed) and convert it to the buffers index using
Int((width - touchPoint.x) * (height - touchPoint.y))
with WIDTH and HEIGHT being the dimensions of the captured image. I am not sure if this is the correct method to achieve this though.
I handle the depth data as such:
func handlePhotoDepthCalculation(point : Int) {
guard let depth = self.photo else {
return
}
//
// Convert Disparity to Depth
//
let depthData = (depth.depthData as AVDepthData!).converting(toDepthDataType: kCVPixelFormatType_DepthFloat32)
let depthDataMap = depthData.depthDataMap //AVDepthData -> CVPixelBuffer
//
// Set Accuracy feedback
//
let accuracy = depthData.depthDataAccuracy
switch (accuracy) {
case .absolute:
/*
NOTE - Values within the depth map are absolutely
accurate within the physical world.
*/
self.accuracyLbl.text = "Absolute"
break
case .relative:
/*
NOTE - Values within the depth data map are usable for
foreground/background separation, but are not absolutely
accurate in the physical world. iPhone always produces this.
*/
self.accuracyLbl.text = "Relative"
}
//
// We convert the data
//
CVPixelBufferLockBaseAddress(depthDataMap, CVPixelBufferLockFlags(rawValue: 0))
let depthPointer = unsafeBitCast(CVPixelBufferGetBaseAddress(depthDataMap), to: UnsafeMutablePointer<Float32>.self)
//
// Get depth value for image center
//
let distanceAtXYPoint = depthPointer[point]
//
// Set UI
//
self.distanceLbl.text = "\(distanceAtXYPoint) m" //Returns distance in meters?
self.filteredLbl.text = "\(depthData.isDepthDataFiltered)"
}
I am not convinced I am getting the correct position. From my research as well it looks like accuracy is only returned in .relative or .absolute and not a float/integer?
To access the depth data at a CGPoint do:
let point = CGPoint(35,26)
let width = CVPixelBufferGetWidth(depthDataMap)
let distanceAtXYPoint = depthPointer[Int(point.y * CGFloat(width) + point.x)]
I hope it works.
Access depth data at pixel position:
let depthDataMap: CVPixelBuffer = ...
let pixelX: Int = ...
let pixelY: Int = ...
CVPixelBufferLockBaseAddress(self, .readOnly)
let bytesPerRow = CVPixelBufferGetBytesPerRow(depthDataMap)
let baseAddress = CVPixelBufferGetBaseAddress(depthDataMap)!
assert(kCVPixelFormatType_DepthFloat32 == CVPixelBufferGetPixelFormatType(depthDataMap))
let rowData = baseAddress + pixelY * bytesPerRow
let distance = rowData.assumingMemoryBound(to: Float32.self)[pixelX]
CVPixelBufferUnlockBaseAddress(self, .readOnly)
For me the values where incorrect and inconsistent when accessing the depth by
let depthPointer = unsafeBitCast(CVPixelBufferGetBaseAddress(depthDataMap), to: UnsafeMutablePointer<Float32>.self)
Values indicating the general accuracy of a depth data map.
The accuracy of a depth data map is highly dependent on the camera calibration data used to generate it. If the camera's focal length cannot be precisely determined at the time of capture, scaling error in the z (depth) plane will be introduced. If the camera's optical center can't be precisely determined at capture time, principal point error will be introduced, leading to an offset error in the disparity estimate.
These values report the accuracy of a map's values with respect to its reported units.
case relative
Values within the depth data map are usable for foreground/background separation, but are not absolutely accurate in the physical world.
case absolute
Values within the depth map are absolutely accurate within the physical world.
You have get CGPoint from AVDepthData buffer like hight and width like follow code.
// Useful data
let width = CVPixelBufferGetWidth(depthDataMap)
let height = CVPixelBufferGetHeight(depthDataMap)
In Apple's sample project they use the code below.
Texturepoint is the touch point projected to metal view used in the sample project.
// scale
let scale = CGFloat(CVPixelBufferGetWidth(depthFrame)) / CGFloat(CVPixelBufferGetWidth(videoFrame))
let depthPoint = CGPoint(x: CGFloat(CVPixelBufferGetWidth(depthFrame)) - 1.0 - texturePoint.x * scale, y: texturePoint.y * scale)
assert(kCVPixelFormatType_DepthFloat16 == CVPixelBufferGetPixelFormatType(depthFrame))
CVPixelBufferLockBaseAddress(depthFrame, .readOnly)
let rowData = CVPixelBufferGetBaseAddress(depthFrame)! + Int(depthPoint.y) * CVPixelBufferGetBytesPerRow(depthFrame)
// swift does not have an Float16 data type. Use UInt16 instead, and then translate
var f16Pixel = rowData.assumingMemoryBound(to: UInt16.self)[Int(depthPoint.x)]
CVPixelBufferUnlockBaseAddress(depthFrame, .readOnly)
var f32Pixel = Float(0.0)
var src = vImage_Buffer(data: &f16Pixel, height: 1, width: 1, rowBytes: 2)
var dst = vImage_Buffer(data: &f32Pixel, height: 1, width: 1, rowBytes: 4)
vImageConvert_Planar16FtoPlanarF(&src, &dst, 0)
// Convert the depth frame format to cm
let depthString = String(format: "%.2f cm", f32Pixel * 100)

How to convert CGImage to OTVideoFrame

What is the best way to convert CGImage to OTVideoFrame?
I tried to get the underlying CGImage pixel buffer and feed it into an OTVideoBuffer, but got a distorted image.
Here is what I have done:
created a new OTVideoFormat object with ARGB pixel format
Set the bytesPerRow of the OTVideoFormat to height*width*4. Taking the value of CGImageGetBytesPerRow(...) did not work, got no error messages but also no frames on the other end of the line.
Copied the rows truncating them to convert from CGImageGetBytesPerRow(...) to height*width*4 bytes per row.
Got a distorted image with rows slightly shifted
Here is the code:
func toOTVideoFrame() throws -> OTVideoFrame {
let width : UInt32 = UInt32(CGImageGetWidth(self)) // self is a CGImage
let height : UInt32 = UInt32(CGImageGetHeight(self))
assert(CGImageGetBitsPerPixel(self) == 32)
assert(CGImageGetBitsPerComponent(self) == 8)
let bitmapInfo = CGImageGetBitmapInfo(self)
assert(bitmapInfo.contains(CGBitmapInfo.FloatComponents) == false)
assert(bitmapInfo.contains(CGBitmapInfo.ByteOrderDefault))
assert(CGImageGetAlphaInfo(self) == .NoneSkipFirst)
let bytesPerPixel : UInt32 = 4
let cgImageBytesPerRow : UInt32 = UInt32(CGImageGetBytesPerRow(self))
let otFrameBytesPerRow : UInt32 = bytesPerPixel * width
let videoFormat = OTVideoFormat()
videoFormat.pixelFormat = .ARGB
videoFormat.bytesPerRow.addObject(NSNumber(unsignedInt: otFrameBytesPerRow))
videoFormat.imageWidth = width
videoFormat.imageHeight = height
videoFormat.estimatedFramesPerSecond = 15
videoFormat.estimatedCaptureDelay = 100
let videoFrame = OTVideoFrame(format: videoFormat)
videoFrame.timestamp = CMTimeMake(0, 1) // This is temporary
videoFrame.orientation = OTVideoOrientation.Up // This is temporary
let dataProvider = CGImageGetDataProvider(self)
let imageData : NSData = CGDataProviderCopyData(dataProvider)!
let buffer = UnsafeMutablePointer<UInt8>.alloc(Int(otFrameBytesPerRow * height))
for currentRow in 0..<height {
let currentRowStartOffsetCGImage = currentRow * cgImageBytesPerRow
let currentRowStartOffsetOTVideoFrame = currentRow * otFrameBytesPerRow
let cgImageRange = NSRange(location: Int(currentRowStartOffsetCGImage), length: Int(otFrameBytesPerRow))
imageData.getBytes(buffer.advancedBy(Int(currentRowStartOffsetOTVideoFrame)),
range: cgImageRange)
}
do {
let planes = UnsafeMutablePointer<UnsafeMutablePointer<UInt8>>.alloc(1)
planes.initialize(buffer)
videoFrame.setPlanesWithPointers(planes, numPlanes: 1)
planes.dealloc(1)
}
return videoFrame
}
The result image:
Solved this issue by my own.
It appears to be a bug in the OpenTok SDK. The SDK does not seem to be able to handle images whose size is not a multiple of 16. When I changed all image sizes to be multiple of 16, everything started to work fine.
TokBox did not bother to state this limitation in the API documentation, nor throw an exception when the input image size is not a multiple of 16.
This is a second critical bug I have found in OpenTok SDK. I strongly suggest you do not use this product. It is of very low quality.

CIPerspectiveCorrection filter returns image flipped and inverted

I'm using the CIPerspectiveCorrection Filter and my problem is that my returned image results are mirrored, upside down, and the points used for the perspective correction seem to be referencing the wrong axis, or axis direction.
In order to isolate the issue I have been working with a test image that is 1024 x 1024 and I am passing in a perfectly rectangular area. I'm still ending up with images flipped vertically and horizontally.
Here is my function that returns a cropped CIImage instance given an image and set of points:
private func _getCroppedImageWithImage(image:CIImage, topLeft:CGPoint, topRight:CGPoint, botLeft:CGPoint, botRight:CGPoint) -> CIImage {
var rectCoords = NSMutableDictionary(capacity: 4)
rectCoords["inputTopLeft"] = CIVector(CGPoint:topLeft)
rectCoords["inputTopRight"] = CIVector(CGPoint:topRight)
rectCoords["inputBottomLeft"] = CIVector(CGPoint:botLeft)
rectCoords["inputBottomRight"] = CIVector(CGPoint:botRight)
return image.imageByApplyingFilter("CIPerspectiveCorrection", withInputParameters: rectCoords)
}
And here is where I am calling this function:
func testCrop() {
let ciInputImage = CIImage(image:UIImage(named:"test-pattern.jpg")!)
println("source image is \(ciInputImage)") //<CIImage: 0x170212290 extent [0 0 1024 1024]>
let ptBotLeft = CGPointMake(32.0,992.0)
let ptBotRight = CGPointMake(992.0,992.0)
let ptTopRight = CGPointMake(992.0,32.0)
let ptTopLeft = CGPointMake(32.0,32.0)
let croppedImage = _getCroppedImageWithImage(ciInputImage, topLeft: ptTopLeft, topRight: ptTopRight, botLeft: ptBotLeft, botRight: ptBotRight)
println("cropped image \(croppedImage)") //<CIImage: 0x174204a60 extent [0 0 960 960]>
let croppedImageCG = CIContext(options: nil).createCGImage(croppedImage, fromRect: croppedImage.extent())
let imageVC = ImageViewController(image: UIImage(CGImage: croppedImageCG))
presentViewController(imageVC, animated: true, completion: nil)
}
Has anyone encountered problems like this before?
Here is the source image
And here is the final image displayed in a UIImageView with contentMode set to scaleAspectFit
OK, my issue, I am pretty sure, is that CoreImage uses the Cartesian coordinate system. Y is up. (zero, zero) is at the bottom left.

Resources