Why is the Vision framework unable to align two images? - ios

I'm trying to take two images using the camera, and align them using the iOS Vision framework:
func align(firstImage: CIImage, secondImage: CIImage) {
let request = VNTranslationalImageRegistrationRequest(
targetedCIImage: firstImage) {
request, error in
if error != nil {
fatalError()
}
let observation = request.results!.first
as! VNImageTranslationAlignmentObservation
secondImage = secondImage.transformed(
by: observation.alignmentTransform)
let compositedImage = firstImage!.applyingFilter(
"CIAdditionCompositing",
parameters: ["inputBackgroundImage": secondImage])
// Save the compositedImage to the photo library.
}
try! visionHandler.perform([request], on: secondImage)
}
let visionHandler = VNSequenceRequestHandler()
But this produces grossly mis-aligned images:
You can see that I've tried three different types of scenes — a close-up subject, an indoor scene, and an outdoor scene. I tried more outdoor scenes, and the result is the same in almost every one of them.
I was expecting a slight misalignment at worst, but not such a complete misalignment. What is going wrong?
I'm not passing the orientation of the images into the Vision framework, but that shouldn't be a problem for aligning images. It's a problem only for things like face detection, where a rotated face isn't detected as a face. In any case, the output images have the correct orientation, so orientation is not the problem.
My compositing code is working correctly. It's only the Vision framework that's a problem. If I remove the calls to the Vision framework, put the phone of a tripod, the composition works perfectly. There's no misalignment. So the problem is the Vision framework.
This is on iPhone X.
How do I get Vision framework to work correctly? Can I tell it to use gyroscope, accelerometer and compass data to improve the alignment?

You should set secondImage as targetImage, and perform handler with firstImage.
I use your composite way.

check out this example from MLBoy:
let request = VNTranslationalImageRegistrationRequest(targetedCIImage: image2, options: [:])
let handler = VNImageRequestHandler(ciImage: image1, options: [:])
do {
try handler.perform([request])
} catch let error {
print(error)
}
guard let observation = request.results?.first as? VNImageTranslationAlignmentObservation else { return }
let alignmentTransform = observation.alignmentTransform
image2 = image2.transformed(by: alignmentTransform)
let compositedImage = image1.applyingFilter("CIAdditionCompositing", parameters: ["inputBackgroundImage": image2])

Related

iOS fast image difference comparison

Im looking for a fast way to compare two frames of video, and decide if a lot has changed between them. This will be used to decide if I should send a request to image recognition service over REST, so I don't want to keep sending them, until there might be some different results. Something similar is doing Vuforia SDK. Im starting with a Framebuffer from ARKit, and I have it scaled to 640:480 and converted to RGB888 vBuffer_image. It could compare just few points, but it needs to find out if difference is significant nicely.
I started by calculating difference between few points using vDSP functions, but this has a disadvantage - if I move camera even very slightly to left/right, then the same points have different portions of image, and the calculated difference is high, even if nothing really changed much.
I was thinking about using histograms, but I didn't test this approach yet.
What would be the best solution for this? It needs to be fast, it can compare just smaller version of image, etc.
I have tested another approach using VNFeaturePointObservation from Vision. This works a lot better, but Im afraid it might be more CPU demanding. I need to test this on some older devices. Anyway, this is a part of code that works nicely. If someone could suggest some better approach to test, please let know:
private var lastScanningImageFingerprint: VNFeaturePrintObservation?
// Returns true if these are different enough
private func compareScanningImages(current: VNFeaturePrintObservation, last: VNFeaturePrintObservation?) -> Bool {
guard let last = last else { return true }
var distance = Float(0)
try! last.computeDistance(&distance, to: current)
print(distance)
return distance > 10
}
// After scanning is done, subclass should prepare suggestedTargets array.
private func performScanningIfNeeded(_ sender: Timer) {
guard !scanningInProgress else { return } // Wait for previous scanning to finish
guard let vImageBuffer = deletate?.currentFrameScalledImage else { return }
guard let image = CGImage.create(from: vImageBuffer) else { return }
func featureprintObservationForImage(image: CGImage) -> VNFeaturePrintObservation? {
let requestHandler = VNImageRequestHandler(cgImage: image, options: [:])
let request = VNGenerateImageFeaturePrintRequest()
do {
try requestHandler.perform([request])
return request.results?.first as? VNFeaturePrintObservation
} catch {
print("Vision error: \(error)")
return nil
}
}
guard let imageFingerprint = featureprintObservationForImage(image: image) else { return }
guard compareScanningImages(current: imageFingerprint, last: lastScanningImageFingerprint) else { return }
print("SCANN \(Date())")
lastScanningImageFingerprint = featureprintObservationForImage(image: image)
executeScanning(on: image) { [weak self] in
self?.scanningInProgress = false
}
}
Tested on older iPhone - as expected this causes some frame drops on camera preview. So I need a faster algorithm

Unable to detect QRCode from image using MLKit

I am using MLKIt for detect QRCode from image. for andrid it is working proper, for ios I am using below pods
pod 'GoogleMLKit/BarcodeScanning'
Here is sample code detect QRcode from image which picked from gallery. every time features array comes empty.
let format: BarcodeFormat = BarcodeFormat.all
let barcodeOptions = BarcodeScannerOptions(formats: format)
let visionImage = VisionImage(image: image)
visionImage.orientation = image.imageOrientation
let barcodeScanner = BarcodeScanner.barcodeScanner(options: barcodeOptions)
barcodeScanner.process(visionImage) { features, error in
guard error == nil, let features = features, !features.isEmpty else {
// Error handling
return
}
// Recognized barcodes
print("Data :: \(features.first?.rawValue ?? "")")
}
We noticed this may happen when there are no padding around the QR code, I also tried to add some padding to it: and it works after that. Could you confirm that it works?
On the other side, ML Kit is also working on a public document on this limitation. Thanks for reporting this.
Julie from ML Kit team

Vision Framework (iOS): How are VNDetectFaceLandmarksRequest and VNDetectFaceRectanglesRequest different?

There are two different requests that you can use for face detection tasks with the iOS Vision Framework: VNDetectFaceLandmarksRequest and VNDetectFaceRectanglesRequest. Both of them return an array of VNFaceObservation, one for each detected face. VNFaceObservation has a variety of optional properties, including boundingBox and landmarks. The landmarks object then also includes optional properties like nose, innerLips, leftEye, etc.
Do the two different Vision requests differ in how they perform face detection?
It seems that VNDetectFaceRectanglesRequest only finds a bounding box (and maybe some other properties), but does not find any landmarks. On the other hand, VNDetectFaceLandmarksRequest seems to find both, bounding box and landmarks.
Are there cases where one request type will find a face and the other one will not? Is VNDetectFaceLandmarksRequest superior to VNDetectFaceRectanglesRequest, or does the latter maybe have advantages in performance or reliability?
Here is an example code of how these two Vision requests can be used:
let faceLandmarkRequest = VNDetectFaceLandmarksRequest()
let faceRectangleRequest = VNDetectFaceRectanglesRequest()
let requestHandler = VNImageRequestHandler(ciImage: image, options: [:])
try requestHandler.perform([faceRectangleRequest, faceLandmarkRequest])
if let rectangleResults = faceRectangleRequest.results as? [VNFaceObservation] {
let boundingBox1 = rectangleResults.first?.boundingBox //this is an optional type
}
if let landmarkResults = faceLandmarkRequest.results as? [VNFaceObservation] {
let boundingBox2 = landmarkResults.first?.boundingBox //this is an optional type
let landmarks = landmarkResults.first?.landmarks //this is an optional type
}
VNDetectFaceRectanglesRequest is a more lightweight operation for finding face rectangle
VNDetectFaceLandmarksRequest is a heavier operation, which can also help locate landmarks on the face

Several QR Codes with ARKit

In a new project we plan to create following AR showcase:
We want to have a wall with some pipes and cables on it. These will have sensors mounted to control and monitor the pipe/cable-system. Since each sensor will have the same dimensions and appearance we plan to add individual QR Codes to each sensor. Reading the documentation of ARWorldTrackingConfiguration and ARImageTrackingConfiguration shows that ARKit is capable of recognizing known images. But the requirements to images make me wonder if the application would work as we want it to when using several QR Codes:
From detectionImages:
[...], identifying art in a museum or adding animated elements to a movie poster.
From Apples Keynote:
Good Images to Track: High Texture, High local Contrast, well distributed histogram, no repetitive structures
Since QR Codes don't match the requirements completely I'm wondering if it's possible to use about 10 QR Codes and have ARKit recognize each of them individually and reliable. Especially when e.g. 3 Codes are in the view. Does anyone have experience in tracking several QR Codes or even a similar showcase?
Recognizing (several) QR-codes has nothing to do with ARKit and can be done in 3 different ways (AVFramework, CIDetector, Vision), of which the latter is preferable in my opinion because you may also want to use its object tracking capabilities (VNTrackObjectRequest). Also it is more robust in my experience.
If you need to place objects in ARKit scene using locations of the QR-codes, you will need to execute hitTest on ARFrame to find code's 3D location (transform) in the world. On that location you will need to place a custom ARAnchor. Using the anchor, you can add a custom SceneKit node to the scene.
UPDATE: So the suggested strategy would be: 1. find QR codes and their 2D location with Vision, 2. find their 3D location (worldTransform) with ARFrame.hitTest(), 3. create custom (subclassed) ARAnchor and add it to the session, 4. in renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) add a custom node (such as SCNText with billboard constraint) for your custom ARAnchor.
If by any chance you are using RxSwift, it can done the easiest with RxVision framework, because it allows to easily pass the relevant ARFrame along into the handler -
var requests = [RxVNRequest<ARFrame>]()
let barcodesRequest: RxVNDetectBarcodesRequest<ARFrame> = VNDetectBarcodesRequest.rx.request(symbologies: [.QR])
self
.barcodesRequest
.observable
.observeOn(Scheduler.main)
.subscribe { [unowned self] (event) in
switch event {
case .next(let completion):
self.detectCodeHandler(value: completion.value, request: completion.request, error: completion.error) // define the method first
default:
break
}
}
.disposed(by: disposeBag)
if let image = anchor as? ARImageAnchor{
guard let buffer: CVPixelBuffer = sceneView.session.currentFrame?.capturedImage else {
print("could not get a pixel buffer")
return
}
let image = CIImage(cvPixelBuffer: buffer)
var message = ""
let features = detector.features(in: image)
for feature in features as! [CIQRCodeFeature] {
message = feature.messageString
break
}
if image.referenceImage.name == "QR1"{
if message == "QR1"{
// add node 1
}else{
sceneView.session.remove(anchor: anchor)
}
} else if image.referenceImage.name == "QR2"{
if message == "QR2"{
// add node 2
}else{
sceneView.session.remove(anchor: anchor)
}
}
}
detector here is CIDetector.Also you need to check renderer(_:didUpdate:for:). I worked on 4 QR codes.
It works assuming no two QR codes can be seen in a frame at same time.

How can I convert a specific part of a map to an image in swift?

If I am using MapView, and want to convert the view to an image where each corner is a map point (zoomed default by the map), how could I do this? (not asking for code, just some direction). I do not want to save the image, but simply get the image matrix to use in computer vision, all quickly and dynamically in the same program. I cannot rely on taking screenshots.
Hope this helps others too - really have looked everywhere but no luck.
Cheers
MKMapSnapshotter is what you're looking for. Here is an example in Swift:
let options = MKMapSnapshotOptions()
options.region = mapView.region
options.size = mapView.frame.size
options.scale = UIScreen.mainScreen().scale
let fileURL = NSURL(fileURLWithPath: "path/to/snapshot.png")
let snapshotter = MKMapSnapshotter(options: options)
snapshotter.startWithCompletionHandler { snapshot, error in
guard let snapshot = snapshot else {
print("Snapshot error: \(error)")
return
}
let data = UIImagePNGRepresentation(snapshot.image)
data?.writeToURL(fileURL, atomically: true)
}
If your interested in reading more about MKMapSnapshotter check out this NSHipster post. In the post there is an example of how to draw an annotation on the image, which I have used before in an app and it works well.
Hope that helps!

Resources