Screen scale factor in XCTest - ios

I am running UITest in which it's crucial to get the screen scale factor. Usually I've used UIScreen.main.scalem but XCUIScreen.main does not seem to have scale property.
Is it possible to access device scale factor in UITests?

I've managed it by adding AccessibilityIdentifier to window in AppDelegate that contains info needed:
int scaleFactor = [[NSNumber numberWithDouble:[UIScreen mainScreen].scale] intValue];
self.window.accessibilityIdentifier = [NSString stringWithFormat:#"windowScale:%d", scaleFactor];
And later accessing it and parsing in my tests:
var windowsScaleFactor: Int {
let scaleFactorString =
guard let scaleFactor = Int(scaleFactorString.components(separatedBy: CharacterSet.decimalDigits.inverted).joined(separator: "")) else {
fatalError("Could not determine window scale factor")
return scaleFactor


How to convert VNRectangleObservation item to UIImage in SwiftUI

I was able to identify squares from a images using VNDetectRectanglesRequest. Now I want those rectangles to store as separate images (UIImage or cgImage). Below is what I tried.
let rectanglesDetection = VNDetectRectanglesRequest { request, error in
rectangles = request.results as! [VNRectangleObservation]
rectangles.sort{$0.boundingBox.origin.y > $1.boundingBox.origin.y}
for rectangle in rectangles {
let rect = rectangle.boundingBox
let imageRef = cgImage.cropping(to: rect)
let image = UIImage(cgImage: imageRef!, scale: image!.scale, orientation: image!.imageOrientation)
Can anybody point out what's wrong or what should be the best approach?
Update 1
At this stage, I'm testing with an image that I added to the assets.
With this image I get 7 rectangles as observations as each for each cell and one for the table margin.
My task is to identify the text inside in each rectangle and my approach is to send VNRecognizeTextRequest for each rectangle that has been identified. My real scenario is little complicated than this but I want to at least achieve this before going forward.
Update 2
for rectangle in rectangles {
let trueX = rectangle.boundingBox.minX * image!.size.width
let trueY = rectangle.boundingBox.minY * image!.size.height
let width = rectangle.boundingBox.width * image!.size.width
let height = rectangle.boundingBox.height * image!.size.height
print("x = " , trueX , " y = " , trueY , " width = " , width , " height = " , height)
let cropZone = CGRect(x: trueX, y: trueY, width: width, height: height)
guard let cutImageRef: CGImage = image?.cgImage?.cropping(to:cropZone)
else {
let croppedImage: UIImage = UIImage(cgImage: cutImageRef)
My image width and height is
width = 406.0 height = 368.0
I've taken my debug interface for you to get a proper understand.
As #Lasse mentioned, this is my actual issue with screenshots.
This is just a guess since you didn't state what the actual problem is, but probably you're getting a zero-sized image for each VNRectangleObservation.
The reason is: Vision uses a normalized coordinate space from 0.0 to 1.0 with lower left origin.
So in order to get the correct rectangle of your original image, you need to convert the rect from Normalized Space to Image Space. Luckily there is VNImageRectForNormalizedRect(::_:) to do just that.

CoreML Memory Leak in iOS 14.5

In my application, I used VNImageRequestHandler with a custom MLModel for object detection.
The app works fine with iOS versions before 14.5.
When iOS 14.5 came, it broke everything.
Whenever try handler.perform([visionRequest]) throws an error (Error Code=11 "encountered unknown exception" UserInfo={NSLocalizedDescription=encountered unknown exception}), the pixelBuffer memory is held and never released, it made the buffers of AVCaptureOutput full then new frame not came.
I have to change the code as below, by copy the pixelBuffer to another var, I solved the problem that new frame not coming, but memory leak problem is still happened.
Because of memory leak, the app crashed after some times.
Notice that before iOS version 14.5, detection works perfectly, try handler.perform([visionRequest]) never throws any error.
Here is my code:
private func predictWithPixelBuffer(sampleBuffer: CMSampleBuffer) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
// Get additional info from the camera.
var options: [VNImageOption : Any] = [:]
if let cameraIntrinsicMatrix = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
options[.cameraIntrinsics] = cameraIntrinsicMatrix
autoreleasepool {
// Because of iOS 14.5, there is a bug that when perform vision request failed, pixel buffer memory leaked so the AVCaptureOutput buffers is full, it will not output new frame any more, this is a temporary work around to copy pixel buffer to a new buffer, this currently make the memory increased a lot also. Need to find a better way
var clonePixelBuffer: CVPixelBuffer? = pixelBuffer.copy()
let handler = VNImageRequestHandler(cvPixelBuffer: clonePixelBuffer!, orientation: orientation, options: options)
print("[DEBUG] detecting...")
do {
try handler.perform([visionRequest])
} catch {
delegate?.detector(didOutputBoundingBox: [])
failedCount += 1
print("[DEBUG] detect failed \(failedCount)")
print("Failed to perform Vision request: \(error)")
clonePixelBuffer = nil
Has anyone experienced the same problem? If so, how did you fix it?
iOS 14.7 Beta available on the developer portal seems to have fixed this issue.
I have a partial fix for this using #Matthijs Hollemans CoreMLHelpers library.
The model I use has 300 classes and 2363 anchors. I used a lot of the code Matthijs provided here to convert the model to MLModel.
In the last step a pipeline is built using the 3 sub models: raw_ssd_output, decoder, and nms. For this workaround you need to remove the nms model from the pipeline, and output raw_confidence and raw_coordinates.
In your app you need to add the code from CoreMLHelpers.
Then add this function to decode the output from your MLModel:
func decodeResults(results:[VNCoreMLFeatureValueObservation]) -> [BoundingBox] {
let raw_confidence: MLMultiArray = results[0].featureValue.multiArrayValue!
let raw_coordinates: MLMultiArray = results[1].featureValue.multiArrayValue!
print(raw_confidence.shape, raw_coordinates.shape)
var boxes = [BoundingBox]()
let startDecoding = Date()
for anchor in 0..<raw_confidence.shape[0].int32Value {
var maxInd:Int = 0
var maxConf:Float = 0
for score in 0..<raw_confidence.shape[1].int32Value {
let key = [anchor, score] as [NSNumber]
let prob = raw_confidence[key].floatValue
if prob > maxConf {
maxInd = Int(score)
maxConf = prob
let y0 = raw_coordinates[[anchor, 0] as [NSNumber]].doubleValue
let x0 = raw_coordinates[[anchor, 1] as [NSNumber]].doubleValue
let y1 = raw_coordinates[[anchor, 2] as [NSNumber]].doubleValue
let x1 = raw_coordinates[[anchor, 3] as [NSNumber]].doubleValue
let width = x1-x0
let height = y1-y0
let x = x0 + width/2
let y = y0 + height/2
let rect = CGRect(x: x, y: y, width: width, height: height)
let box = BoundingBox(classIndex: maxInd, score: maxConf, rect: rect)
let finishDecoding = Date()
let keepIndices = nonMaxSuppressionMultiClass(numClasses: raw_confidence.shape[1].intValue, boundingBoxes: boxes, scoreThreshold: 0.5, iouThreshold: 0.6, maxPerClass: 5, maxTotal: 10)
let finishNMS = Date()
var keepBoxes = [BoundingBox]()
for index in keepIndices {
print("Time Decoding", finishDecoding.timeIntervalSince(startDecoding))
print("Time Performing NMS", finishNMS.timeIntervalSince(finishDecoding))
return keepBoxes
Then when you receive the results from Vision, you call the function like this:
if let rawResults = vnRequest.results as? [VNCoreMLFeatureValueObservation] {
let boxes = self.decodeResults(results: rawResults)
This solution is slow because of the way I move the data around and formulate my list of BoundingBox types. It would be much more efficient to process the MLMultiArray data using underlying pointers, and maybe use Accelerate to find the maximum score and best class for each anchor box.
In my case it helped to disable neural engine by forcing CoreML to run on CPU and GPU only. This is often slower but doesn't throw the exception (at least in our case). At the end we implemented a policy to force some of our models to not run on neural engine for certain iOS devices.
See MLModelConfiguration.computeUntis to constraint the hardware coreml model can use.

How to read depth data at a CGPoint from AVDepthData buffer

I am attempting to find the depth data at a certain point in the captured image and return the distance in meters.
I have enabled depth data and am capturing the data alongside the image. I get the point from the X,Y coordinates of the center of the image (and when pressed) and convert it to the buffers index using
Int((width - touchPoint.x) * (height - touchPoint.y))
with WIDTH and HEIGHT being the dimensions of the captured image. I am not sure if this is the correct method to achieve this though.
I handle the depth data as such:
func handlePhotoDepthCalculation(point : Int) {
guard let depth = else {
// Convert Disparity to Depth
let depthData = (depth.depthData as AVDepthData!).converting(toDepthDataType: kCVPixelFormatType_DepthFloat32)
let depthDataMap = depthData.depthDataMap //AVDepthData -> CVPixelBuffer
// Set Accuracy feedback
let accuracy = depthData.depthDataAccuracy
switch (accuracy) {
case .absolute:
NOTE - Values within the depth map are absolutely
accurate within the physical world.
self.accuracyLbl.text = "Absolute"
case .relative:
NOTE - Values within the depth data map are usable for
foreground/background separation, but are not absolutely
accurate in the physical world. iPhone always produces this.
self.accuracyLbl.text = "Relative"
// We convert the data
CVPixelBufferLockBaseAddress(depthDataMap, CVPixelBufferLockFlags(rawValue: 0))
let depthPointer = unsafeBitCast(CVPixelBufferGetBaseAddress(depthDataMap), to: UnsafeMutablePointer<Float32>.self)
// Get depth value for image center
let distanceAtXYPoint = depthPointer[point]
// Set UI
self.distanceLbl.text = "\(distanceAtXYPoint) m" //Returns distance in meters?
self.filteredLbl.text = "\(depthData.isDepthDataFiltered)"
I am not convinced I am getting the correct position. From my research as well it looks like accuracy is only returned in .relative or .absolute and not a float/integer?
To access the depth data at a CGPoint do:
let point = CGPoint(35,26)
let width = CVPixelBufferGetWidth(depthDataMap)
let distanceAtXYPoint = depthPointer[Int(point.y * CGFloat(width) + point.x)]
I hope it works.
Access depth data at pixel position:
let depthDataMap: CVPixelBuffer = ...
let pixelX: Int = ...
let pixelY: Int = ...
CVPixelBufferLockBaseAddress(self, .readOnly)
let bytesPerRow = CVPixelBufferGetBytesPerRow(depthDataMap)
let baseAddress = CVPixelBufferGetBaseAddress(depthDataMap)!
assert(kCVPixelFormatType_DepthFloat32 == CVPixelBufferGetPixelFormatType(depthDataMap))
let rowData = baseAddress + pixelY * bytesPerRow
let distance = rowData.assumingMemoryBound(to: Float32.self)[pixelX]
CVPixelBufferUnlockBaseAddress(self, .readOnly)
For me the values where incorrect and inconsistent when accessing the depth by
let depthPointer = unsafeBitCast(CVPixelBufferGetBaseAddress(depthDataMap), to: UnsafeMutablePointer<Float32>.self)
Values indicating the general accuracy of a depth data map.
The accuracy of a depth data map is highly dependent on the camera calibration data used to generate it. If the camera's focal length cannot be precisely determined at the time of capture, scaling error in the z (depth) plane will be introduced. If the camera's optical center can't be precisely determined at capture time, principal point error will be introduced, leading to an offset error in the disparity estimate.
These values report the accuracy of a map's values with respect to its reported units.
case relative
Values within the depth data map are usable for foreground/background separation, but are not absolutely accurate in the physical world.
case absolute
Values within the depth map are absolutely accurate within the physical world.
You have get CGPoint from AVDepthData buffer like hight and width like follow code.
// Useful data
let width = CVPixelBufferGetWidth(depthDataMap)
let height = CVPixelBufferGetHeight(depthDataMap)
In Apple's sample project they use the code below.
Texturepoint is the touch point projected to metal view used in the sample project.
// scale
let scale = CGFloat(CVPixelBufferGetWidth(depthFrame)) / CGFloat(CVPixelBufferGetWidth(videoFrame))
let depthPoint = CGPoint(x: CGFloat(CVPixelBufferGetWidth(depthFrame)) - 1.0 - texturePoint.x * scale, y: texturePoint.y * scale)
assert(kCVPixelFormatType_DepthFloat16 == CVPixelBufferGetPixelFormatType(depthFrame))
CVPixelBufferLockBaseAddress(depthFrame, .readOnly)
let rowData = CVPixelBufferGetBaseAddress(depthFrame)! + Int(depthPoint.y) * CVPixelBufferGetBytesPerRow(depthFrame)
// swift does not have an Float16 data type. Use UInt16 instead, and then translate
var f16Pixel = rowData.assumingMemoryBound(to: UInt16.self)[Int(depthPoint.x)]
CVPixelBufferUnlockBaseAddress(depthFrame, .readOnly)
var f32Pixel = Float(0.0)
var src = vImage_Buffer(data: &f16Pixel, height: 1, width: 1, rowBytes: 2)
var dst = vImage_Buffer(data: &f32Pixel, height: 1, width: 1, rowBytes: 4)
vImageConvert_Planar16FtoPlanarF(&src, &dst, 0)
// Convert the depth frame format to cm
let depthString = String(format: "%.2f cm", f32Pixel * 100)

Getting device aspect ratio in Xcode programmatically

I'm making a universal game across all apple platforms. The problem is there are lots of aspect ratios, and with the growing numbers of devices, it becomes a hassle. I've tried the following:
var deviceAspectRatio: CGFloat? {
#if os(iOS)
if UIDevice.current.model.contains("iPhone") {
return 16/9
} else if UIDevice.current.model.contains("iPad") {
return 4/3
#elseif os(tvOS)
return 16/9 //There might be other aspect ratios also
#elseif os(watchOS)
return 1
#elseif os(macOS)
//figure out aspect ratio
return nil
But even with this, Xcode gives me an error:
Missing return in a function expected to return 'CGFloat?'
The trick on macOS is that there might be more than one screen, so if this is the case, you'll have to decide which one you're interested in. However, if you settle on a screen, you can just get the frame from NSScreen.frame and divide the width by the height.
This code will get the aspect ratio for the screen a given window is on:
guard let frame = someWindow.screen?.frame else { return nil }
let aspectRatio = NSWidth(frame) / NSHeight(frame)
Also, you should probably be doing something similar with UIScreen on iOS instead of hard-coding the values there. Apple may someday release a new device with some other aspect ratio your app doesn't anticipate.
In order to make it universal code for all devices you can use DeviceKit dependency and then
import DeviceKit
let device = Device.current
let deviceAspectRatio = device.screenRatio.height / device.screenRatio.width
let deviceWidth = UIScreen.main.bounds.width * UIScreen.main.scale
let deviceHeight = UIScreen.main.bounds.height * UIScreen.main.scale
let testDeviceAspectRatio = deviceHeight / deviceWidth

How to convert CGImage to OTVideoFrame

What is the best way to convert CGImage to OTVideoFrame?
I tried to get the underlying CGImage pixel buffer and feed it into an OTVideoBuffer, but got a distorted image.
Here is what I have done:
created a new OTVideoFormat object with ARGB pixel format
Set the bytesPerRow of the OTVideoFormat to height*width*4. Taking the value of CGImageGetBytesPerRow(...) did not work, got no error messages but also no frames on the other end of the line.
Copied the rows truncating them to convert from CGImageGetBytesPerRow(...) to height*width*4 bytes per row.
Got a distorted image with rows slightly shifted
Here is the code:
func toOTVideoFrame() throws -> OTVideoFrame {
let width : UInt32 = UInt32(CGImageGetWidth(self)) // self is a CGImage
let height : UInt32 = UInt32(CGImageGetHeight(self))
assert(CGImageGetBitsPerPixel(self) == 32)
assert(CGImageGetBitsPerComponent(self) == 8)
let bitmapInfo = CGImageGetBitmapInfo(self)
assert(bitmapInfo.contains(CGBitmapInfo.FloatComponents) == false)
assert(CGImageGetAlphaInfo(self) == .NoneSkipFirst)
let bytesPerPixel : UInt32 = 4
let cgImageBytesPerRow : UInt32 = UInt32(CGImageGetBytesPerRow(self))
let otFrameBytesPerRow : UInt32 = bytesPerPixel * width
let videoFormat = OTVideoFormat()
videoFormat.pixelFormat = .ARGB
videoFormat.bytesPerRow.addObject(NSNumber(unsignedInt: otFrameBytesPerRow))
videoFormat.imageWidth = width
videoFormat.imageHeight = height
videoFormat.estimatedFramesPerSecond = 15
videoFormat.estimatedCaptureDelay = 100
let videoFrame = OTVideoFrame(format: videoFormat)
videoFrame.timestamp = CMTimeMake(0, 1) // This is temporary
videoFrame.orientation = OTVideoOrientation.Up // This is temporary
let dataProvider = CGImageGetDataProvider(self)
let imageData : NSData = CGDataProviderCopyData(dataProvider)!
let buffer = UnsafeMutablePointer<UInt8>.alloc(Int(otFrameBytesPerRow * height))
for currentRow in 0..<height {
let currentRowStartOffsetCGImage = currentRow * cgImageBytesPerRow
let currentRowStartOffsetOTVideoFrame = currentRow * otFrameBytesPerRow
let cgImageRange = NSRange(location: Int(currentRowStartOffsetCGImage), length: Int(otFrameBytesPerRow))
range: cgImageRange)
do {
let planes = UnsafeMutablePointer<UnsafeMutablePointer<UInt8>>.alloc(1)
videoFrame.setPlanesWithPointers(planes, numPlanes: 1)
return videoFrame
The result image:
Solved this issue by my own.
It appears to be a bug in the OpenTok SDK. The SDK does not seem to be able to handle images whose size is not a multiple of 16. When I changed all image sizes to be multiple of 16, everything started to work fine.
TokBox did not bother to state this limitation in the API documentation, nor throw an exception when the input image size is not a multiple of 16.
This is a second critical bug I have found in OpenTok SDK. I strongly suggest you do not use this product. It is of very low quality.
