Incorrect frame of boundingBox with VNRecognizedObjectObservation - ios

I'm having an issue with displaying bounding box around recognized object using Core ML & Vision.
The horizontal detection seems to be working correctly, however, vertically the box is too tall, goes over the top edge of the video, doesn't go all the way to the bottom of the video, and it doesn't follow motion of the camera correctly. Here you can see the issue: https://imgur.com/Sppww8T
This is how video data output is initialized:
let videoDataOutput = AVCaptureVideoDataOutput()
videoDataOutput.alwaysDiscardsLateVideoFrames = true
videoDataOutput.videoSettings = [kCVPixelBufferPixelFormatTypeKey as String: Int(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)]
videoDataOutput.setSampleBufferDelegate(self, queue: dataOutputQueue!)
self.videoDataOutput = videoDataOutput
session.addOutput(videoDataOutput)
let c = videoDataOutput.connection(with: .video)
c?.videoOrientation = .portrait
I've also tried other video orientations, without much success.
Performing the vision request:
let handler = VNImageRequestHandler(cvPixelBuffer: image, options: [:])
try? handler.perform(vnRequests)
And finally once the request is processed. viewRect is set to the size of the video view: 812x375 (I know, video layer itself is a bit shorter, but that's not the issue here):
let observationRect = VNImageRectForNormalizedRect(observation.boundingBox, Int(viewRect.width), Int(viewRect.height))
I've also tried doing something like (with more issues):
var observationRect = observation.boundingBox
observationRect.origin.y = 1.0 - observationRect.origin.y
observationRect = videoPreviewLayer.layerRectConverted(fromMetadataOutputRect: observationRect)
I've tried to cut out as much of what I deemed to be irrelevant code as possible.
I've actually come across a similar issue using Apple's sample code, when the bounding box wouldn't vertically go around objects as expected: https://developer.apple.com/documentation/vision/recognizing_objects_in_live_capture Maybe that means that there is some issue with the API?

I use something like this:
let width = view.bounds.width
let height = width * 16 / 9
let offsetY = (view.bounds.height - height) / 2
let scale = CGAffineTransform.identity.scaledBy(x: width, y: height)
let transform = CGAffineTransform(scaleX: 1, y: -1).translatedBy(x: 0, y: -height - offsetY)
let rect = prediction.boundingBox.applying(scale).applying(transform)
This assumes portrait orientation and a 16:9 aspect ratio. It assumes the .imageCropAndScaleOption = .scaleFill.
Credits: The transform code was taken from this repo: https://github.com/Willjay90/AppleFaceDetection

Related

How to convert VNRectangleObservation item to UIImage in SwiftUI

I was able to identify squares from a images using VNDetectRectanglesRequest. Now I want those rectangles to store as separate images (UIImage or cgImage). Below is what I tried.
let rectanglesDetection = VNDetectRectanglesRequest { request, error in
rectangles = request.results as! [VNRectangleObservation]
rectangles.sort{$0.boundingBox.origin.y > $1.boundingBox.origin.y}
for rectangle in rectangles {
let rect = rectangle.boundingBox
let imageRef = cgImage.cropping(to: rect)
let image = UIImage(cgImage: imageRef!, scale: image!.scale, orientation: image!.imageOrientation)
checkBoxImages.append(image)
}
Can anybody point out what's wrong or what should be the best approach?
Update 1
At this stage, I'm testing with an image that I added to the assets.
With this image I get 7 rectangles as observations as each for each cell and one for the table margin.
My task is to identify the text inside in each rectangle and my approach is to send VNRecognizeTextRequest for each rectangle that has been identified. My real scenario is little complicated than this but I want to at least achieve this before going forward.
Update 2
for rectangle in rectangles {
let trueX = rectangle.boundingBox.minX * image!.size.width
let trueY = rectangle.boundingBox.minY * image!.size.height
let width = rectangle.boundingBox.width * image!.size.width
let height = rectangle.boundingBox.height * image!.size.height
print("x = " , trueX , " y = " , trueY , " width = " , width , " height = " , height)
let cropZone = CGRect(x: trueX, y: trueY, width: width, height: height)
guard let cutImageRef: CGImage = image?.cgImage?.cropping(to:cropZone)
else {
return
}
let croppedImage: UIImage = UIImage(cgImage: cutImageRef)
croppedImages.append(croppedImage)
}
My image width and height is
width = 406.0 height = 368.0
I've taken my debug interface for you to get a proper understand.
As #Lasse mentioned, this is my actual issue with screenshots.
This is just a guess since you didn't state what the actual problem is, but probably you're getting a zero-sized image for each VNRectangleObservation.
The reason is: Vision uses a normalized coordinate space from 0.0 to 1.0 with lower left origin.
So in order to get the correct rectangle of your original image, you need to convert the rect from Normalized Space to Image Space. Luckily there is VNImageRectForNormalizedRect(::_:) to do just that.

making the arkit camera vertical rather than horizontal

I am trying to make an app where the user enters some text in a textfield and then the app displays this text in front of the ar camera to the user. I positioned the text correctly in front of the camera and I have changed the anchor of the text to be in the center of the text. but when I add the text into the scene the text is rotated 90 degrees around the z-axis. And I know why but I don't know how to solve it. The reason is that the camera of the arscene.session has a rotation of 0 for all x, y, z when the device is in landscape but since I want my app to be in portrait I rotate the device 90 degrees which rotates the camera as well and since the text has the same camera rotation, it's rotated as well. I tried correcting the rotation of the text by rotating it again around the z-axis but that doesn't solve the entire issue because when I change the direction of my phone, that affects the camera axis which will affect different axis of the text(not the same axis because I rotated the axis in the correction step). so I think the only way to solve the issue is to rotate the camera to be in consistent with the portrait mode from the beginning but I haven't found any way to set the rotation of the camera
here is the code of adding the text:
private func createTextNode(text:String?)
{
guard let text = text else {return}
let arText = SCNText(string: text, extrusionDepth: 1)
arText.font = UIFont(name: arText.font.fontName, size: 2)
arText.firstMaterial?.diffuse.contents = selectedColor
//making the node
let node = SCNNode()
node.geometry = arText
center(node: node)
guard let currentFrame = sceneView.session.currentFrame else {return}
let camera = currentFrame.camera
let cameraTransform = camera.transform
var newTransform = matrix_identity_float4x4
newTransform.columns.3.z = -0.2
let modifiedTransform = matrix_multiply(cameraTransform, newTransform)
node.transform = SCNMatrix4(modifiedTransform)
node.scale = SCNVector3(0.02, 0.02, 0.02)
self.sceneView.scene.rootNode.addChildNode(node)
node.eulerAngles.x = 90.degrees
}
and that's how the output looks like..
output
any help will be appreciated
You cannot use the matrix identity for any orientation, it has to be rotated depending on device orientation. I have a function in my apps that I call to update that before I perform the matrix multiplication :
var translation = matrix_identity_float4x4
func updateTranslationMatrix() {
switch UIDevice.current.orientation{
case .portrait, .portraitUpsideDown, .unknown, .faceDown, .faceUp:
print("portrait ")
translation.columns.0.x = -cos(.pi/2)
translation.columns.0.y = sin(.pi/2)
translation.columns.1.x = -sin(.pi/2)
translation.columns.1.y = -cos(.pi/2)
case .landscapeLeft :
print("landscape left")
translation.columns.0.x = 1
translation.columns.0.y = 0
translation.columns.1.x = 0
translation.columns.1.y = 1
case .landscapeRight :
print("landscape right")
translation.columns.0.x = cos(.pi)
translation.columns.0.y = -sin(.pi)
translation.columns.1.x = sin(.pi)
translation.columns.1.y = cos(.pi)
}
translation.columns.3.z = -0.6 //60cm in front of the camera
}
If you mean that you don't want the orientation of the device to change along with the rotation, then:
Go to Project > General > Deployment Info
Under device orientation, uncheck all boxes except 'Portrait'.
If this does not solve your problem and what you really want is to fix the euler angles of your text node, let me know, I'll be happy to help.

iOS CGAffineTransform with Masking

Im currently developing an iOS Application where you can process an image. (rotating, zooming, translating). Im using an uiimageview where i added gestures. This works fine but i also have some masking rectangle of a fixed size. Initial State
After i processed my image i want the content which is inside my masking rectangle.
I also want the four edge points of the masking rectangle of the processed image.
I know i have to apply the imageview transform to the points somehow, but its not working.
let points = maskView.edgePoints()
let translateTransform = CGAffineTransform(translationX: translationPoint.x, y: translationPoint.y)
let rotateTransform = CGAffineTransform(rotationAngle: CGFloat(rotationAngle))
let scaleTransform = CGAffineTransform(scaleX: xScale, y: yScale)
let finalTransform = rotateTransform.concatenating(scaleTransform).concatenating(translateTransform)
let topleftPoint = points[0].applying(finalTransform)
let toprightPoint = points[1].applying(finalTransform)
let bottomleftPoint = points[2].applying(finalTransform)
let bottomrightPoint = points[3].applying(finalTransform)
Edge point results: Sample
Topleft: (50.75, -8.75)
Topright: (63.6072332330383, -365.252863911729)
Bottomleft: (-172.064289944831, -16.7857707706489)
Bottomright: (-159.207056711792, -373.288634682378)
But the Topleft should be something like (0,0)
and the Bottomleft something like (40,200)?
Maybe you can give me some hints or useful links!
Thx in advance!
The problem lies in your transformation order. Right now your transformation order is Rotate Scale Translate, it should be Scale Rotate Translate instead.
let finalTransform = scaleTransform.concatenating(rotateTransform).concatenating(translateTransform)

Rotate my SceneKit material

I'm taking images with AVCapturePhotoOutput and then using their JPEG representation as the texture on a SceneKit SCNPlane that is the same aspect ratio as the image:
let image = UIImage(data: dataImage!)
let rectangle = SCNPlane(width:9, height:12)
let rectmaterial = SCNMaterial()
rectmaterial.diffuse.contents = image
rectmaterial.isDoubleSided = true
rectangle.materials = [rectmaterial]
let rectnode = SCNNode(geometry: rectangle)
let pos = sceneSpacePosition(inFrontOf: self.pictCamera, atDistance: 16.5) // 16.5 is arbitrary, but makes the rectangle the same size as the camera
rectnode.position = pos
rectnode.orientation = self.pictCamera.orientation
pictView.scene?.rootNode.addChildNode(rectnode)
sceneSpacePosition is a bit of code that can be found here on SO that maps CoreMotion into SceneKit orientation. It is used to place the rectangle, which does indeed appear at the right location with the right size. All very cool.
The problem is that the image is rotated 90 degrees to the rectangle. So I did the obvious:
rectmaterial.diffuse.contentsTransform = SCNMatrix4MakeRotation(Float.pi / 2, 0, 0, 1)
This does not work property; the resulting image is unrecognizable. It appears that one small part of the image has been stretched to a huge size. I thought it might be the axis, but I tried all three with the same result.
Any ideas?
You are rotating on the upper left corner as suggested by Alain T.
If you move your image down, you may get the rotation you were expecting.
Try this:
let translation = SCNMatrix4MakeTranslation(0, -1, 0)
let rotation = SCNMatrix4MakeRotation(Float.pi / 2, 0, 0, 1)
let transform = SCNMatrix4Mult(translation, rotation)
rectmaterial.diffuse.contentsTransform = transform

GPUImage crop to CGRect and rotate

Given a CGRect, I want to use GPUImage to crop a video. For example, if the rect is (0, 0, 50, 50), the video would be cropped at (0,0) with a length of 50 on each side.
What's throwing me is that GPUImageCropFilter doesn't take a rectangle, rather a normalized crop region with values ranging from 0 to 1. My intuition was to to this:
let assetSize = CGSizeApplyAffineTransform(videoTrack.naturalSize, videoTrack.preferredTransform)
let cropRect = CGRect(x: frame.minX/assetSize.width,
y: frame.minY/assetSize.height,
width: frame.width/assetSize.width,
height: frame.height/assetSize.height)
to calculate the crop region based on the size of the incoming asset. Then:
// Filter
let cropFilter = GPUImageCropFilter(cropRegion: cropRect)
let url = NSURL(fileURLWithPath: "\(NSTemporaryDirectory())\(String.random()).mp4")
let movieWriter = GPUImageMovieWriter(movieURL: url, size: assetSize)
movieWriter.encodingLiveVideo = false
movieWriter.shouldPassthroughAudio = false
// add targets
movieFile.addTarget(cropFilter)
cropFilter.addTarget(movieWriter)
cropFilter.forceProcessingAtSize(frame.size)
cropFilter.setInputRotation(kGPUImageRotateRight, atIndex: 0)
What should the movie writer size be? Shouldn't it be the size of the frame I want to crop with? And should I be using forceProcessingAtSize with the size value of my crop frame?
A complete code example would be great; I've been trying for hours and I can't seem to get the section of the video that I want.
FINAL:
if let videoTrack = self.asset.tracks.first {
let movieFile = GPUImageMovie(asset: self.asset)
let transformedRegion = CGRectApplyAffineTransform(region, videoTrack.preferredTransform)
// Filters
let cropFilter = GPUImageCropFilter(cropRegion: transformedRegion)
let url = NSURL(fileURLWithPath: "\(NSTemporaryDirectory())\(String.random()).mp4")
let renderSize = CGSizeApplyAffineTransform(videoTrack.naturalSize, CGAffineTransformMakeScale(transformedRegion.width, transformedRegion.height))
let movieWriter = GPUImageMovieWriter(movieURL: url, size: renderSize)
movieWriter.transform = videoTrack.preferredTransform
movieWriter.encodingLiveVideo = false
movieWriter.shouldPassthroughAudio = false
// add targets
// http://stackoverflow.com/questions/37041231/gpuimage-crop-to-cgrect-and-rotate
movieFile.addTarget(cropFilter)
cropFilter.addTarget(movieWriter)
movieWriter.completionBlock = {
observer.sendNext(url)
observer.sendCompleted()
}
movieWriter.failureBlock = { _ in
observer.sendFailed(.VideoCropFailed)
}
disposable.addDisposable {
cropFilter.removeTarget(movieWriter)
movieWriter.finishRecording()
}
movieWriter.startRecording()
movieFile.startProcessing()
}
As you note, the GPUImageCropFilter takes in a rectangle in normalized coordinates. You're on the right track, in that you just need to convert your CGRect in pixels to normalized coordinates by dividing the X components (origin.x and size.width) by the width of the image and the Y components by the height.
You don't need to use forceProcessingAtSize(), because the crop will automatically output an image of the appropriate cropped size. The movie writer's size should be matched to this cropped size, which you should know from your original CGRect.
The one complication you introduce is the rotation. If you need to apply a rotation in addition to your crop, you might want to check and make sure that you don't need to swap your X and Y for your crop region. This should be apparent in the output if the two need to be swapped.
There were some bugs with applying rotation at the same time as a crop a while ago, and I can't remember if I fixed all those. If I didn't, you could insert a dummy filter (gamma or brightness set to default values) before or after the crop and apply the rotation at that stage.

Resources