VNRectangleObservation corners compressed in x-axis on iPhone - ios

I'm capturing video via my device's camera, and feeding it to the Vision framework to perform rectangle detection. The code looks something like this (compressed for brevity ... hidden lines not relevant to this question):
func captureOutput(_ output: AVCaptureOutput,
didOutput sampleBuffer:
CMSampleBuffer, from connection: AVCaptureConnection) {
// Get a CIImage from the buffer
guard let buffer = CMSampleBufferGetImageBuffer(sampleBuffer) else { return }
let image = CIImage(cvImageBuffer: buffer)
// Set up corner detector
let handler = VNImageRequestHandler(ciImage: image, orientation: .up options: [:])
let request = VNDetectRectanglesRequest()
// Perform corner detection
do {
try handler.perform([request])
guard let observation = request.results?.first as? VNRectangleObservation else {
print("error at \(#line)")
return
}
handleCorners(observation)
} catch {
print("Error: \(error)")
return
}
}
This works just fine on an iPad Air 2, and I can use the corners in the observation object to draw a nice overlay. But on an iPhone X the corners in the x-axis are "compressed".
For example, if I capture an image with a business card that occupies almost the entire width of the screen, I would expect observation.topLeft to have an x value close to zero. Instead it's nearly 0.15. This is true for the righthand corners too (expected: ~1.0, actual: ~0.85).
Any idea why this might be the case? The CIImage extent property is the same on both devices. It's just that Vision's corners are compressed in the x-axis.

I had a pretty similar problem with detecting rectangles in realtime using ARKit. And after some investigation I saw this answer and figure out that: "The problem is that ARKit provides the image buffer (frame.capturedImage) with a camera resolution of 1920 x 1440. The screen of the iPhone X is 375 x 812 points. It seems like ARKit can see more than it can display on the phone screen." So I just corrected capturedImage size using screen proportion, and this "solution" fix my problem.

Related

Swift - How to crop a QR code properly using an ARSession and Vision library?

This is a long question so I wanted to put a TL;DR on top:
I want to track QR codes via on of two methods: image tracking by cropping them upon detection, or placing anchors with raycasting. Both of these methods fail when the phone is in portrait mode. Camera source is an ARSession, SceneKit and RealityKit not used. There's only ARKit. What to do?
I am currently working on an application with Swift in which I try to render some stuff on a server, transmit the video to iPhone and display it on screen using a MTKView. I only needed a custom Meal shader to apply some complex calculations to received frames, so I did not use SceneKit or RealityKit. I only have ARSession from ARKit and a Metal view here, and up to this point everything works fine.
I am able to do image tracking at this point. However, I want to apply this behaviour to QR codes. What I want is to detect a QR code (multiple if possible) and then track it just like images. Since I don't have the QR code as ARReferenceImages beforehand like normal image tracking, I was left with two options:
Option 1: Using raycast(_:) on ARSession
This is probably the right way to do it. However, for this I need to activate both plane tracking options on ARSession, which then creates many anchors and managing them with image tracking becomes harder. This is not the actual problem though. Actual problem is that when the phone is in landscape mode, raycasting works as intended. When phone goes into portrait mode, even if I pass the frame in correct orientation it misses everything and hit test results return empty. I am not using hitTest(_:) because it is deprecated.
I want to explain the "correct orientation" thing here before going into second option. ARSession is capturing frames and I am able to check each frame through didUpdate delegate function of the session. When I read the pixel buffer out of the frame using frame.capturedImage and turn it into a CIImage, the image is always in landscape mode (width > height). Doesn't matter if the phone is in portrait mode or not. So whenever I want to pass this image, I am using oriented(.right) for portrait and oriented(.up) for landscape. I got that idea from another question asked about QR bounding box, and so far it is the best option (but not good enough). Just want to note that when I tried raycasting, I tried it with the image size, not screen size (screen size = my Metal view size because it is fullscreen) since the image is larger than the screen in reality. I am able to see this if I put a breakpoint and quicklook my CIImage created from current camera frame.
Option 2: Cropping the QR and treating it as image tracking
This is another approach which I am currently working on. Algorithm is simple: check every frame with Vision. If there are detected QR codes, read their data first. If that data matches with an existing QR, then re-read it if the cropped QR size is larger than existing one. If not, do nothing. Then use this cropped QR image for tracking QR as an image. At this point we would have the data already so no problems here.
However, I tried many times to do the proper transformation explained here in the answer. Again, I think I am able to transform normalized bounding box into a real rect which can correctly crop the image. Yet, as it is in raycasting, works perfectly only if the phone is in landscape position. When in portrait it works good enough ONLY IF the phone is really close to QR code and it is centered on the screen.
For related code, I have this in my View controller:
private var ciContext: CIContext = CIContext.init(options: nil)
private var sequenceHandler: VNImageRequestHandler?
And then I have this code to extract QR codes from CIImage:
func extractQrCode(image: CIImage) -> [VNBarcodeObservation]? {
self.sequenceHandler = VNImageRequestHandler(ciImage: image)
let barcodeRequest = VNDetectBarcodesRequest()
barcodeRequest.symbologies = [.QR]
try? self.sequenceHandler?.perform([barcodeRequest])
guard let results = barcodeRequest.results else {
return nil
}
return results
}
An this is the delegate that checks and operates on every frame (code currently for Option 2):
func session(_ session: ARSession, didUpdate frame: ARFrame) {
let rotImg = self.renderer?.getInterfaceOrientation() == .portrait ? CIImage(cvPixelBuffer: frame.capturedImage).oriented(.right) : CIImage(cvPixelBuffer: frame.capturedImage)
if let barcodes = self.extractQrCode(image: rotImg) {
for barcode in barcodes {
guard let payload = barcode.payloadStringValue else { continue }
var rect = CGRect()
rect = VNImageRectForNormalizedRect(barcode.boundingBox.botToTop(), Int(rotImg.extent.width), Int(rotImg.extent.height))
let existingQR = TrackedImagesManager.imagesToTrack.filter{ $0.isQR && $0.QRData == payload}.first
if ((rect.size.width < 800 || rect.size.height < 800 || abs(rect.size.height - rect.size.width) > 32) && existingQR == nil) {
DispatchQueue.main.async {
self.showToastMessage(message: "Please get closer to the QR code and try centering it on your screen.", font: UIFont.systemFont(ofSize: 18), duration: 3)
}
continue
} else if (existingQR != nil) {
if (rect.width > existingQR?.originalImage?.size.width ?? 999) {
let croppedImg = rotImg.cropped(to: rect)
let croppedCgImage = self.ciContext.createCGImage(croppedImg, from: croppedImg.extent)!
let trackImg = UIImage(cgImage: croppedCgImage)
existingQR?.originalImage = trackImg
existingQR?.image = ARReferenceImage(croppedCgImage, orientation: .up, physicalWidth: 0.1)
} else {
continue
}
} else if rect.width != 0 {
let croppedImg = rotImg.cropped(to: rect)
let croppedCgImage = self.ciContext.createCGImage(croppedImg, from: croppedImg.extent)!
let trackImg = UIImage(cgImage: croppedCgImage)
TrackedImagesManager.imagesToTrack.append(TrackedImage(id: 9, type: 1, image: ARReferenceImage(croppedCgImage, orientation: .up, physicalWidth: 0.1), originalImage: trackImg, isQR: true, QRData: payload))
print("qr norm rect: \(barcode.boundingBox) \n qr rect: \(rect) \nqr data: \(payload) \nqr hittestres: ")
}
}
}
}
Finally, for the transformation, I have this extension (tried various ways, this is the best so far):
extension CGRect {
func botToTop() -> CGRect {
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -1)
return self.applying(transform)
}
}
So for both options I need some advice to make things right. Android side of the same thing is implemented as in Option 2, but Android returns a nicely cropped QR code upon detection. We don't have that. What do I do now?

setting CVImageBuffer to all black

I am trying to modify some sample code from Apple Developer codes for my own purpose (I am very new to programming for iOS). I am trying to get images from a camera and apply some detection and just show the detections.
Currently, I am using the AVCaptureVideoPreviewLayer and basically the camera feed gets displayed on the screen. I actually want to zero out the camera feed and draw some detections only. So, I am basically trying to handle this in the captureOutputfunction. So something like:
extension ViewController: AVCaptureVideoDataOutputSampleBufferDelegate {
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
// Grab the pixelbuffer frame from the camera output
guard let pixelBuffer = sampleBuffer.imageBuffer else { return }
// Here now I should be able to set it 0s (all black)
}
}
I am trying to do something basic like setting this CVImageBuffer to a black background but have not been able to figure that out in the last hours!
EDIT
So, I discovered that I can do something like:
var image: CGImage?
// Create a Core Graphics bitmap image from the buffer.
VTCreateCGImageFromCVPixelBuffer(pixelBuffer, options: nil, imageOut: &image)
This copies the buffer data to the CGImage, which I can then use for my purposes. Now, is there an API that can basically make an all black image with the same size as one represented by the input image buffer?

Why AVCaptureVideoDataOutput doesn't give me supported highest resolution frame?

I am working on iOS application which uses camera. I am using AVCaptureVideoDataOutput delegate method to get video frame. I always getting video frame with 1920 * 1080 regardless of device I am using which is iPhone X.
I am using AVCaptureSession.Preset.high
Here is my code snipped
func captureOutput(_ captureOutput: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
let ciImage = CIImage(cvPixelBuffer: CMSampleBufferGetImageBuffer(sampleBuffer))
let image = UIImage(ciImage: ciImage)
}
When I do
let device = AVCaptureDevice.devices(for: AVMediaType.video).first {
($0 as AVCaptureDevice).position == AVCaptureDevice.Position.back
}
print("resolutions supported:: \(String(describing: device?.activeFormat.highResolutionStillImageDimensions)))")
This always gives me 3840 * 2160 for iPhone x which is having 12 megapixel
I am expecting same kind of highest possible resolution video frame through AVCaptureVideoDataOutput.
I tried using AVCaptureSession.Preset.photo it also doesn't give me high resolution.
I did try AVCaptureSession.Preset.hd4K3840x2160 which gives me expected resolution for frame but it may not work with older iPhone???
I know AVCapturePhotoOutput can give me higher resolution image. But for my use case I want to create image from video frame.
What I am doing wrong here?
I agree with #adamfowlerphoto. And the answer Why you need to check and then apply 4K video preset is the function is according to hardware specification. Like, if your old phone doesn't have a high resolution sensor or lens which is good enough, you cannot use it;hd4K3840x2160

iOS 11 using vision framework VNDetectRectanglesRequest to do object detection not precisely?

Apple have new features in iOS 11 that allows you use vision framework to do object detection without models. I try these new APIs but found the result from VNDetectRectanglesRequest is not good. Am I using the APIs correctly?
Here is some good case:
And some bad case:
Here is my code:
func captureOutput(_ output: AVCaptureOutput, didOutput sampleBuffer: CMSampleBuffer, from connection: AVCaptureConnection) {
guard let pixelBuffer: CVPixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer)
// create the request
let request2 = VNDetectRectanglesRequest { (request, error) in
self.VNDetectRectanglesRequestCompletionBlock(request: request, error: error)
}
do {
request2.minimumConfidence = 0.7
try self.visionSequenceHandler.perform([request2], on: pixelBuffer)
} catch {
print("Throws: \(error)")
}
}
func VNDetectRectanglesRequestCompletionBlock(request: VNRequest, error: Error?) {
if let array = request.results {
if array.count > 0 {
let ob = array.first as? VNRectangleObservation
print("count: \(array.count)")
print("fps: \(self.measureFPS())")
DispatchQueue.main.async {
let boxRect = ob!.boundingBox
let transRect = self.transformRect(fromRect: boxRect, toViewRect: self.cameraLayer.frame)
var transformedRect = ob!.boundingBox
//transformedRect.origin.y = 1 - transformedRect.origin.y
let convertedRect = self.cameraLayer.layerRectConverted(fromMetadataOutputRect: transformedRect)
self.highlightView?.frame = convertedRect
}
}
}
}
There are a lot of misconception, expectation, and black-box issues that have been brought up already. But aside from that, you’re also using the API incorrectly.
The rectangle detector finds areas in the image that appear to represent real-world rectangular shapes. In most cases, the camera capturing an image sees a real rectangular object in perspective — so its 3D projection onto the 2D image plane will usually not be rectangular. For example, the 2D projection of the computer screen in one of your photos is more trapezoidal, because the top corners are farther from the camera than the bottom corners.
You get this shape by looking at the actual corners of the detected rectangle — see the properties of the VNRectangleObservation object. If you draw lines between those four corners, you’ll usually find something that better tracks the shape of a computer screen, piece of paper, etc in your photo.
The boundingBox property instead gets you the smallest rectangular area — that is, rectangular in image space — containing those corner points. So it won’t follow the shape of a real rectangular object unless your camera perspective is just right.
Your commented out line is almost right, you need to put that back but change it to:
transformedRect.origin.y = 1 - (transformedRect.origin.y + transformedRect.width)
Your 'bad case' example the square is actually from the soft toy on the right.
Your good ones look right because they are in the centre of the screen.

Swift - captureOutput frame extracted color is always coming near to black

I am trying to process the video frames and extracting the concentrated color out of it. I was using the AVCaptureStillImageOutput but it was making the shutter sound everytime I take a frame for the processing so I switched to AVCaptureVideoDataOutput and now processing each frame as it comes by.
Here is the code I am using:
func captureOutput(captureOutput: AVCaptureOutput!, didOutputSampleBuffer sampleBuffer: CMSampleBuffer!, fromConnection connection: AVCaptureConnection!) {
currentFrame = self.convertImageFromCMSampleBufferRef(sampleBuffer);
if let image = UIImage(CIImage: currentFrame){
if let color = self.extractColor(image) {
// print the color code
}
}
}
func convertImageFromCMSampleBufferRef(sampleBuffer:CMSampleBuffer) -> CIImage{
let pixelBuffer:CVPixelBufferRef = CMSampleBufferGetImageBuffer(sampleBuffer);
var ciImage:CIImage = CIImage(CVPixelBuffer: pixelBuffer)
return ciImage;
}
With the AVCaptureStillImageOutput I was getting almost correct output but with the AVCaptureVideoDataOutput the values are always near to black even if the camera view is into the bright light. I am guessing the problem is around the framerate or something but not able to figure it out.
In the last few test run this is the only color code I am getting #1b1f01
I would love to use the original AVCaptureStillImageOutput code but it should not make the Shutter sound and I am not able to disable it.
Had this same issue myself. It was just that it was very early; for whatever reason the camera sensor starts at 0 and is willing to give you frames before what you'd think of as the first frame is fully exposed.
Solution: just wait a second before you expect any real images.

Resources