I am using MLKIt for detect QRCode from image. for andrid it is working proper, for ios I am using below pods
pod 'GoogleMLKit/BarcodeScanning'
Here is sample code detect QRcode from image which picked from gallery. every time features array comes empty.
let format: BarcodeFormat = BarcodeFormat.all
let barcodeOptions = BarcodeScannerOptions(formats: format)
let visionImage = VisionImage(image: image)
visionImage.orientation = image.imageOrientation
let barcodeScanner = BarcodeScanner.barcodeScanner(options: barcodeOptions)
barcodeScanner.process(visionImage) { features, error in
guard error == nil, let features = features, !features.isEmpty else {
// Error handling
return
}
// Recognized barcodes
print("Data :: \(features.first?.rawValue ?? "")")
}
We noticed this may happen when there are no padding around the QR code, I also tried to add some padding to it: and it works after that. Could you confirm that it works?
On the other side, ML Kit is also working on a public document on this limitation. Thanks for reporting this.
Julie from ML Kit team
Related
Kind of new to Swift in general, but I'm trying to make a simple RAW camera app for fun. Apple's documentation says that to configure a photo output, you do
let query = photoOutput.isAppleProRAWEnabled ?
{ AVCapturePhotoOutput.isAppleProRAWPixelFormat($0) } :
{ AVCapturePhotoOutput.isBayerRAWPixelFormat($0) }
// Retrieve the RAW format, favoring Apple ProRAW when enabled.
guard let rawFormat =
photoOutput.availableRawPhotoPixelFormatTypes.first(where: query) else {
fatalError("No RAW format found.")
}
but I've been getting an error with the first let statement which says "'isAppleProRAWEnabled' is only available in iOS 14.3 or newer." Is there any way to force it to check for ProRaw, even not on iOS 14.3? I'm not even interested in using ProRaw, but I can't figure out how to get rid of the check and just select the classic RAW format (which I think is the bayer format). If anyone knows a workaround, that would be great!
You can query for the Bayer RAW format as below:
let rawFormatQuery = {AVCapturePhotoOutput.isBayerRAWPixelFormat($0)}
guard let rawFormat = photoOutput.availableRawPhotoPixelFormatTypes.first(where: rawFormatQuery) else {
fatalError("No RAW format found.")
}
Then you set your photo settings using the raw format:
let photoSettings = AVCapturePhotoSettings(rawPixelFormatType: rawFormat,
processedFormat: processedFormat)
Finally, you call your capture delegate as described in the Apple documentation (which I think is where you got the code above).
https://developer.apple.com/documentation/avfoundation/cameras_and_media_capture/capturing_photos_in_raw_and_apple_proraw_formats
I am working on an iOS app where I need to use a CoreML model to perform image classification.
I used Google Cloud Platform AutoML Vision to train the model. Google provides a CoreML version of the model and I downloaded it to use in my app.
I followed Google's tutorial and everything appeared to be going smoothly. However when it use time to start using the model and got very strange prediction. I got the confidence of the prediction and then I got a very strange string that I didn't know what it was.
<VNClassificationObservation: 0x600002091d40> A7DBD70C-541C-4112-84A4-C6B4ED2EB7E2 requestRevision=1 confidence=0.332127 "CICAgICAwPmveRIJQWdsYWlzX2lv"
The string I am referring to is CICAgICAwPmveRIJQWdsYWlzX2lv.
After some research and debugging I found out that this is a NSCFString.
https://developer.apple.com/documentation/foundation/1395135-nsclassfromstring
Apparently this is part of the foundation API. Does anyone has any experience with this?
With the CoreML file also comes a dict.txt file with the correct labels. Do I have to convert this string to the labels? How do I do that.
This the code I have so far.
//
// Classification.swift
// Lepidoptera
//
// Created by Tomás Mamede on 15/09/2020.
// Copyright © 2020 Tomás Santiago. All rights reserved.
//
import Foundation
import SwiftUI
import Vision
import CoreML
import ImageIO
class Classification {
private lazy var classificationRequest: VNCoreMLRequest = {
do {
let model = try VNCoreMLModel(for: AutoML().model)
let request = VNCoreMLRequest(model: model, completionHandler: { [weak self] request, error in
if let classifications = request.results as? [VNClassificationObservation] {
print(classifications.first ?? "No classification!")
}
})
request.imageCropAndScaleOption = .scaleFit
return request
}
catch {
fatalError("Error! Can't use Model.")
}
}()
func classifyImage(receivedImage: UIImage) {
let orientation = CGImagePropertyOrientation(rawValue: UInt32(receivedImage.imageOrientation.rawValue))
if let image = CIImage(image: receivedImage) {
DispatchQueue.global(qos: .userInitiated).async {
let handler = VNImageRequestHandler(ciImage: image, orientation: orientation!)
do {
try handler.perform([self.classificationRequest])
}
catch {
fatalError("Error classifying image!")
}
}
}
}
}
The labels are stored in your mlmodel file. If you open the mlmodel in the Xcode 12 model viewer, it will display what those labels are.
My guess is that instead of actual labels, your mlmodel file contains "CICAgICAwPmveRIJQWdsYWlzX2lv" and so on.
It looks like Google's AutoML does not put the correct class labels into the Core ML model.
You can make a dictionary in the app that maps "CICAgICAwPmveRIJQWdsYWlzX2lv" and so on to the real labels.
Or you can replace these labels inside the mlmodel file by editing it using coremltools. (My e-book Core ML Survival Guide has a chapter on how to replace the labels in the model.)
I am using Firebase cloudVision (ML) API to read image.
I am able to the get the information of an image back but it is not specific.
Example: when I take and upload a picture of MacBook it is giving the output as "notebook,Loptop,electronic device..etc".
But I want to get its brand name like Apple MacBook ,
I have seen few apps doing this .
I could not find any information regarding this, so here I am posting.
Please suggest or guide if anyone come across this
My Code:
func pickedImage(image: UIImage) {
imageView.image = image
imageView.contentMode = .scaleAspectFit
guard let image = imageView.image else { return }
// let onCloudLabeler =
Vision.vision().cloudImageLabeler(options: options)
let onCloudLabeler = Vision.vision().cloudImageLabeler()
// Define the metadata for the image.
let imageMetadata = VisionImageMetadata()
imageMetadata.orientation = .topLeft
// Initialize a VisionImage object with the given UIImage.
let visionImage = VisionImage(image: image)
visionImage.metadata = imageMetadata
onCloudLabeler.process(visionImage) { labels, error in
guard error == nil, let labels = labels, !labels.isEmpty
else {
// [START_EXCLUDE]
let errorString = error?.localizedDescription ?? "No results returned."
print("Label detection failed with error: \(errorString)")
//self.showResults()
// [END_EXCLUDE]
return
}
// [START_EXCLUDE]
var results = [String]()
let resultsText = labels.map { label -> String in
results.append(label.text)
return "Label: \(label.text), " +
"Confidence: \(label.confidence ?? 0), " +
"EntityID: \(label.entityID ?? "")"
}.joined(separator: "\n")
//self.showResults()
// [END_EXCLUDE]
print(results.count)
print(resultsText)
self.labelTxt.text = results.joined(separator: ",")
results.removeAll()
}
}
If you've seen other apps doing something that your app doesn't do, those other apps are likely using a different ML model than the one you're using.
If you want to accomplish the same using ML Kit for Firebase, you can use a custom model that you either trained yourself or got from another source.
As Puf said, the apps you saw are probably using their own custom ML model. ML Kit now supports creating custom image classification models from your own training data. Check out the AutoML Vision Edge functionality here: https://firebase.google.com/docs/ml-kit/automl-vision-edge
In a new project we plan to create following AR showcase:
We want to have a wall with some pipes and cables on it. These will have sensors mounted to control and monitor the pipe/cable-system. Since each sensor will have the same dimensions and appearance we plan to add individual QR Codes to each sensor. Reading the documentation of ARWorldTrackingConfiguration and ARImageTrackingConfiguration shows that ARKit is capable of recognizing known images. But the requirements to images make me wonder if the application would work as we want it to when using several QR Codes:
From detectionImages:
[...], identifying art in a museum or adding animated elements to a movie poster.
From Apples Keynote:
Good Images to Track: High Texture, High local Contrast, well distributed histogram, no repetitive structures
Since QR Codes don't match the requirements completely I'm wondering if it's possible to use about 10 QR Codes and have ARKit recognize each of them individually and reliable. Especially when e.g. 3 Codes are in the view. Does anyone have experience in tracking several QR Codes or even a similar showcase?
Recognizing (several) QR-codes has nothing to do with ARKit and can be done in 3 different ways (AVFramework, CIDetector, Vision), of which the latter is preferable in my opinion because you may also want to use its object tracking capabilities (VNTrackObjectRequest). Also it is more robust in my experience.
If you need to place objects in ARKit scene using locations of the QR-codes, you will need to execute hitTest on ARFrame to find code's 3D location (transform) in the world. On that location you will need to place a custom ARAnchor. Using the anchor, you can add a custom SceneKit node to the scene.
UPDATE: So the suggested strategy would be: 1. find QR codes and their 2D location with Vision, 2. find their 3D location (worldTransform) with ARFrame.hitTest(), 3. create custom (subclassed) ARAnchor and add it to the session, 4. in renderer(_ renderer: SCNSceneRenderer, nodeFor anchor: ARAnchor) add a custom node (such as SCNText with billboard constraint) for your custom ARAnchor.
If by any chance you are using RxSwift, it can done the easiest with RxVision framework, because it allows to easily pass the relevant ARFrame along into the handler -
var requests = [RxVNRequest<ARFrame>]()
let barcodesRequest: RxVNDetectBarcodesRequest<ARFrame> = VNDetectBarcodesRequest.rx.request(symbologies: [.QR])
self
.barcodesRequest
.observable
.observeOn(Scheduler.main)
.subscribe { [unowned self] (event) in
switch event {
case .next(let completion):
self.detectCodeHandler(value: completion.value, request: completion.request, error: completion.error) // define the method first
default:
break
}
}
.disposed(by: disposeBag)
if let image = anchor as? ARImageAnchor{
guard let buffer: CVPixelBuffer = sceneView.session.currentFrame?.capturedImage else {
print("could not get a pixel buffer")
return
}
let image = CIImage(cvPixelBuffer: buffer)
var message = ""
let features = detector.features(in: image)
for feature in features as! [CIQRCodeFeature] {
message = feature.messageString
break
}
if image.referenceImage.name == "QR1"{
if message == "QR1"{
// add node 1
}else{
sceneView.session.remove(anchor: anchor)
}
} else if image.referenceImage.name == "QR2"{
if message == "QR2"{
// add node 2
}else{
sceneView.session.remove(anchor: anchor)
}
}
}
detector here is CIDetector.Also you need to check renderer(_:didUpdate:for:). I worked on 4 QR codes.
It works assuming no two QR codes can be seen in a frame at same time.
I'm trying to take two images using the camera, and align them using the iOS Vision framework:
func align(firstImage: CIImage, secondImage: CIImage) {
let request = VNTranslationalImageRegistrationRequest(
targetedCIImage: firstImage) {
request, error in
if error != nil {
fatalError()
}
let observation = request.results!.first
as! VNImageTranslationAlignmentObservation
secondImage = secondImage.transformed(
by: observation.alignmentTransform)
let compositedImage = firstImage!.applyingFilter(
"CIAdditionCompositing",
parameters: ["inputBackgroundImage": secondImage])
// Save the compositedImage to the photo library.
}
try! visionHandler.perform([request], on: secondImage)
}
let visionHandler = VNSequenceRequestHandler()
But this produces grossly mis-aligned images:
You can see that I've tried three different types of scenes — a close-up subject, an indoor scene, and an outdoor scene. I tried more outdoor scenes, and the result is the same in almost every one of them.
I was expecting a slight misalignment at worst, but not such a complete misalignment. What is going wrong?
I'm not passing the orientation of the images into the Vision framework, but that shouldn't be a problem for aligning images. It's a problem only for things like face detection, where a rotated face isn't detected as a face. In any case, the output images have the correct orientation, so orientation is not the problem.
My compositing code is working correctly. It's only the Vision framework that's a problem. If I remove the calls to the Vision framework, put the phone of a tripod, the composition works perfectly. There's no misalignment. So the problem is the Vision framework.
This is on iPhone X.
How do I get Vision framework to work correctly? Can I tell it to use gyroscope, accelerometer and compass data to improve the alignment?
You should set secondImage as targetImage, and perform handler with firstImage.
I use your composite way.
check out this example from MLBoy:
let request = VNTranslationalImageRegistrationRequest(targetedCIImage: image2, options: [:])
let handler = VNImageRequestHandler(ciImage: image1, options: [:])
do {
try handler.perform([request])
} catch let error {
print(error)
}
guard let observation = request.results?.first as? VNImageTranslationAlignmentObservation else { return }
let alignmentTransform = observation.alignmentTransform
image2 = image2.transformed(by: alignmentTransform)
let compositedImage = image1.applyingFilter("CIAdditionCompositing", parameters: ["inputBackgroundImage": image2])