There are two different requests that you can use for face detection tasks with the iOS Vision Framework: VNDetectFaceLandmarksRequest and VNDetectFaceRectanglesRequest. Both of them return an array of VNFaceObservation, one for each detected face. VNFaceObservation has a variety of optional properties, including boundingBox and landmarks. The landmarks object then also includes optional properties like nose, innerLips, leftEye, etc.
Do the two different Vision requests differ in how they perform face detection?
It seems that VNDetectFaceRectanglesRequest only finds a bounding box (and maybe some other properties), but does not find any landmarks. On the other hand, VNDetectFaceLandmarksRequest seems to find both, bounding box and landmarks.
Are there cases where one request type will find a face and the other one will not? Is VNDetectFaceLandmarksRequest superior to VNDetectFaceRectanglesRequest, or does the latter maybe have advantages in performance or reliability?
Here is an example code of how these two Vision requests can be used:
let faceLandmarkRequest = VNDetectFaceLandmarksRequest()
let faceRectangleRequest = VNDetectFaceRectanglesRequest()
let requestHandler = VNImageRequestHandler(ciImage: image, options: [:])
try requestHandler.perform([faceRectangleRequest, faceLandmarkRequest])
if let rectangleResults = faceRectangleRequest.results as? [VNFaceObservation] {
let boundingBox1 = rectangleResults.first?.boundingBox //this is an optional type
}
if let landmarkResults = faceLandmarkRequest.results as? [VNFaceObservation] {
let boundingBox2 = landmarkResults.first?.boundingBox //this is an optional type
let landmarks = landmarkResults.first?.landmarks //this is an optional type
}
VNDetectFaceRectanglesRequest is a more lightweight operation for finding face rectangle
VNDetectFaceLandmarksRequest is a heavier operation, which can also help locate landmarks on the face
Related
I am using MLKIt for detect QRCode from image. for andrid it is working proper, for ios I am using below pods
pod 'GoogleMLKit/BarcodeScanning'
Here is sample code detect QRcode from image which picked from gallery. every time features array comes empty.
let format: BarcodeFormat = BarcodeFormat.all
let barcodeOptions = BarcodeScannerOptions(formats: format)
let visionImage = VisionImage(image: image)
visionImage.orientation = image.imageOrientation
let barcodeScanner = BarcodeScanner.barcodeScanner(options: barcodeOptions)
barcodeScanner.process(visionImage) { features, error in
guard error == nil, let features = features, !features.isEmpty else {
// Error handling
return
}
// Recognized barcodes
print("Data :: \(features.first?.rawValue ?? "")")
}
We noticed this may happen when there are no padding around the QR code, I also tried to add some padding to it: and it works after that. Could you confirm that it works?
On the other side, ML Kit is also working on a public document on this limitation. Thanks for reporting this.
Julie from ML Kit team
I load models inside my iPad app as a MDLAsset to view them using RealityKit. As I need to change the texture of the models I currently simply apply a SimpleMaterial on the ModelEntity in RealityKit. This works fine, but as there is no elegant way to export the RealityKit object back as e.g. a usd-file. So I need to persist my changes to the MDLAsset, which then can be easily exported.
I managed to persistently change the transform of the MDLAsset, but had no luck changing the material in any way.
Of course there would always be the workaround using SceneKit, which makes it easy to manipulate the model and export it to a MDLAsset. But I hope there is a more straight forward approach.
What I'm doing right now?
let modelAsset = MDLAsset(url: modelUrl)
guard let modelMesh = modelAsset.object(at: 0) as? MDLMesh else {
return
}
modelMesh.transform = some MDLTransform
// change material here
try! modelAsset.export(to: exportUrl)
I tried many different ways, but most come down to:
let objectMaterial = SCNMaterial()
objectMaterial.diffuse.contents = modelColor
let modelMaterial = MDLMaterial.init(scnMaterial: objectMaterial)
for subMesh in modelMesh.submeshes as! [MDLSubmesh] {
subMesh.material = modelMaterial
}
I also tried manipulating the MDLMaterialProperties of the existing subMesh material, but it seems the material is somehow write protected, at least I didn't manage to change its baseColor semantic.
So is there a direct way to apply a material to a MDLAsset / MDLMesh / MDLSubmesh?
In a new project we plan to create following AR showcase:
We want to have a wall with some pipes and cables on it. These will have sensors mounted to control and monitor the pipe/cable-system. Since each sensor will have the same dimensions and appearance we plan to add individual QR Codes to each sensor. Reading the documentation of ARWorldTrackingConfiguration and ARImageTrackingConfiguration shows that ARKit is capable of recognizing known images. But the requirements to images make me wonder if the application would work as we want it to when using several QR Codes:
From detectionImages:
[...], identifying art in a museum or adding animated elements to a movie poster.
From Apples Keynote:
Good Images to Track: High Texture, High local Contrast, well distributed histogram, no repetitive structures
Since QR Codes don't match the requirements completely I'm wondering if it's possible to use about 10 QR Codes and have ARKit recognize each of them individually and reliable. Especially when e.g. 3 Codes are in the view. Does anyone have experience in tracking several QR Codes or even a similar showcase?
Recognizing (several) QR-codes has nothing to do with ARKit and can be done in 3 different ways (AVFramework, CIDetector, Vision), of which the latter is preferable in my opinion because you may also want to use its object tracking capabilities (VNTrackObjectRequest). Also it is more robust in my experience.
If you need to place objects in ARKit scene using locations of the QR-codes, you will need to execute hitTest on ARFrame to find code's 3D location (transform) in the world. On that location you will need to place a custom ARAnchor. Using the anchor, you can add a custom SceneKit node to the scene.
UPDATE: So the suggested strategy would be: 1. find QR codes and their 2D location with Vision, 2. find their 3D location (worldTransform) with ARFrame.hitTest(), 3. create custom (subclassed) ARAnchor and add it to the session, 4. in renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) add a custom node (such as SCNText with billboard constraint) for your custom ARAnchor.
If by any chance you are using RxSwift, it can done the easiest with RxVision framework, because it allows to easily pass the relevant ARFrame along into the handler -
var requests = [RxVNRequest<ARFrame>]()
let barcodesRequest: RxVNDetectBarcodesRequest<ARFrame> = VNDetectBarcodesRequest.rx.request(symbologies: [.QR])
self
.barcodesRequest
.observable
.observeOn(Scheduler.main)
.subscribe { [unowned self] (event) in
switch event {
case .next(let completion):
self.detectCodeHandler(value: completion.value, request: completion.request, error: completion.error) // define the method first
default:
break
}
}
.disposed(by: disposeBag)
if let image = anchor as? ARImageAnchor{
guard let buffer: CVPixelBuffer = sceneView.session.currentFrame?.capturedImage else {
print("could not get a pixel buffer")
return
}
let image = CIImage(cvPixelBuffer: buffer)
var message = ""
let features = detector.features(in: image)
for feature in features as! [CIQRCodeFeature] {
message = feature.messageString
break
}
if image.referenceImage.name == "QR1"{
if message == "QR1"{
// add node 1
}else{
sceneView.session.remove(anchor: anchor)
}
} else if image.referenceImage.name == "QR2"{
if message == "QR2"{
// add node 2
}else{
sceneView.session.remove(anchor: anchor)
}
}
}
detector here is CIDetector.Also you need to check renderer(_:didUpdate:for:). I worked on 4 QR codes.
It works assuming no two QR codes can be seen in a frame at same time.
I'm trying to take two images using the camera, and align them using the iOS Vision framework:
func align(firstImage: CIImage, secondImage: CIImage) {
let request = VNTranslationalImageRegistrationRequest(
targetedCIImage: firstImage) {
request, error in
if error != nil {
fatalError()
}
let observation = request.results!.first
as! VNImageTranslationAlignmentObservation
secondImage = secondImage.transformed(
by: observation.alignmentTransform)
let compositedImage = firstImage!.applyingFilter(
"CIAdditionCompositing",
parameters: ["inputBackgroundImage": secondImage])
// Save the compositedImage to the photo library.
}
try! visionHandler.perform([request], on: secondImage)
}
let visionHandler = VNSequenceRequestHandler()
But this produces grossly mis-aligned images:
You can see that I've tried three different types of scenes — a close-up subject, an indoor scene, and an outdoor scene. I tried more outdoor scenes, and the result is the same in almost every one of them.
I was expecting a slight misalignment at worst, but not such a complete misalignment. What is going wrong?
I'm not passing the orientation of the images into the Vision framework, but that shouldn't be a problem for aligning images. It's a problem only for things like face detection, where a rotated face isn't detected as a face. In any case, the output images have the correct orientation, so orientation is not the problem.
My compositing code is working correctly. It's only the Vision framework that's a problem. If I remove the calls to the Vision framework, put the phone of a tripod, the composition works perfectly. There's no misalignment. So the problem is the Vision framework.
This is on iPhone X.
How do I get Vision framework to work correctly? Can I tell it to use gyroscope, accelerometer and compass data to improve the alignment?
You should set secondImage as targetImage, and perform handler with firstImage.
I use your composite way.
check out this example from MLBoy:
let request = VNTranslationalImageRegistrationRequest(targetedCIImage: image2, options: [:])
let handler = VNImageRequestHandler(ciImage: image1, options: [:])
do {
try handler.perform([request])
} catch let error {
print(error)
}
guard let observation = request.results?.first as? VNImageTranslationAlignmentObservation else { return }
let alignmentTransform = observation.alignmentTransform
image2 = image2.transformed(by: alignmentTransform)
let compositedImage = image1.applyingFilter("CIAdditionCompositing", parameters: ["inputBackgroundImage": image2])
have an issue with getting from VNClassificationObservation.
My goal id to recognize the object and display popup with the object name, I'm able to get name but I can't get object coordinates or frame.
Here is code:
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: requestOptions)
do {
try handler.perform([classificationRequest, detectFaceRequest])
} catch {
print(error)
}
Then I handle
func handleClassification(request: VNRequest, error: Error?) {
guard let observations = request.results as? [VNClassificationObservation] else {
fatalError("unexpected result type from VNCoreMLRequest")
}
// Filter observation
let filteredOservations = observations[0...10].filter({ $0.confidence > 0.1 })
// Update UI
DispatchQueue.main.async { [weak self] in
for observation in filteredOservations {
print("observation: ",observation.identifier)
//HERE: I need to display popup with observation name
}
}
}
UPDATED:
lazy var classificationRequest: VNCoreMLRequest = {
// Load the ML model through its generated class and create a Vision request for it.
do {
let model = try VNCoreMLModel(for: Inceptionv3().model)
let request = VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
request.imageCropAndScaleOption = VNImageCropAndScaleOptionCenterCrop
return request
} catch {
fatalError("can't load Vision ML model: \(error)")
}
}()
A pure classifier model can only answer "what is this a picture of?", not detect and locate objects in the picture. All the free models on the Apple developer site (including Inception v3) are of this kind.
When Vision works with such a model, it identifies the model as a classifier based on the outputs declared in the MLModel file, and returns VNClassificationObservation objects as output.
If you find or create a model that's trained to both identify and locate objects, you can still use it with Vision. When you convert that model to Core ML format, the MLModel file will describe multiple outputs. When Vision works with a model that has multiple outputs, it returns an array of VNCoreMLFeatureValueObservation objects — one for each output of the model.
How the model declares its outputs would determine which feature values represent what. A model that reports a classification and a bounding box could output a string and four doubles, or a string and a multi array, etc.
Addendum: Here's a model that works on iOS 11 and returns VNCoreMLFeatureValueObservation: TinyYOLO
That's because classifiers do not return objects coordinates or frames. A classifier only gives a probability distribution over a list of categories.
What model are you using here?
For tracking and identifying objects, you’ll have to create your own model using Darknet. I have struggled the same problem, and used TuriCreate to train model, and instead of just providing images to the framework, you’ll have also to provide ones with bounding boxes. Apple have documented here, how to create those models:
Apple TuriCreate docs