Error when converting to current syntax - ios

Last year, I did a tutorial on how to create a camera app. Today I tried to run it but it needed to be converted to the current syntax. After fixing a majority of the errors, I'm now left with just one error that I can't fix. I'm not too familiar with AVFoundation or Swift 4 yet so I was hoping that I could receive some sort of assistance. The error that Im getting is on line 61/62:
Initializer for conditional binding must have optional type, not [AVCaptureDevice]
Here's the relevant code:
//capture devices
func configureCaptureDevices() throws {
let session = AVCaptureDevice.DiscoverySession(deviceTypes: [], mediaType: <#T##AVMediaType?#>, position: <#T##AVCaptureDevice.Position#>)
guard let cameras = (session.devices.flatMap { $0 }), !cameras.isEmpty else { throw CameraControllerError.noCamerasAvailable}
for camera in cameras {
if camera.position == .front {
self.frontCamera = camera
}
if camera.position == .back {
self.rearCamera = camera
try camera.lockForConfiguration()
camera.focusMode = .continuousAutoFocus
camera.unlockForConfiguration()
}
}
}
And finally, a link to the tutorial I was following: https://www.appcoda.com/avfoundation-swift-guide/

The problem is that you're using guard let syntax which is only for binding optionals. And you've correctly deduced that session is no longer optional (which is why you removed the ?). So pull the assignment of cameras out of the guard statement:
let session = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: .video, position: .unspecified)
let cameras = session.devices.flatMap { $0 }
guard !cameras.isEmpty else { throw CameraControllerError.noCamerasAvailable }
Personally, now that we're no longer doing any optional binding, I'd replace that guard statement with a more intuitive if statement. I also don't believe that flatMap is needed (you use that when dealing with arrays of arrays or arrays of optionals, neither of which applies here):
let cameras = session.devices
if cameras.isEmpty { throw CameraControllerError.noCamerasAvailable }

Related

Action ML Classifier not giving expected results

I am creating an app which detect the exercises. i trained the model using create ML. i got 100% result in create ML application. But when i am integrating into the application using Vision framework it's always showing only one exercise. i followed the code exactly from Build an Action Classifier with Create ML for creating ml and requesting VNHumanBodyPoseObservation. Followed this for converting VNHumanBodyPoseObservation to MLMultiArray.
Here is the code what i do:
func didOutput(pixelBuffer: CVPixelBuffer) {
self.extractPoses(pixelBuffer)
}
func extractPoses(_ pixelBuffer: CVPixelBuffer) {
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer)
let request = VNDetectHumanBodyPoseRequest { (request, err) in
if err == nil {
if let observations =
request.results as? [VNRecognizedPointsObservation], observations.count > 0 {
if let prediction = try? self.makePrediction(observations) {
print("\(prediction.label), confidence: \(prediction.confidence)")
}
}
}
}
do {
// Perform the body pose-detection request.
try handler.perform([request])
} catch {
print("Unable to perform the request: \(error).\n")
}
}
func makePrediction(_ observations: [VNRecognizedPointsObservation]) throws -> (label: String, confidence: Double) {
let fitnessClassifier = try PlayerExcercise(configuration: MLModelConfiguration())
let numAvailableFrames = observations.count
let observationsNeeded = 60
var multiArrayBuffer = [MLMultiArray]()
for frameIndex in 0 ..< min(numAvailableFrames, observationsNeeded) {
let pose = observations[frameIndex]
do {
let oneFrameMultiArray = try pose.keypointsMultiArray()
multiArrayBuffer.append(oneFrameMultiArray)
} catch {
continue
}
}
// If poseWindow does not have enough frames (45) yet, we need to pad 0s
if numAvailableFrames < observationsNeeded {
for _ in 0 ..< (observationsNeeded - numAvailableFrames) {
do {
let oneFrameMultiArray = try MLMultiArray(shape: [1, 3, 18], dataType: .double)
try resetMultiArray(oneFrameMultiArray)
multiArrayBuffer.append(oneFrameMultiArray)
} catch {
continue
}
}
}
let modelInput = MLMultiArray(concatenating: [MLMultiArray](multiArrayBuffer), axis: 0, dataType: .float)
//
//
let predictions = try fitnessClassifier.prediction(poses: modelInput)
return (label: predictions.label, confidence: predictions.labelProbabilities[predictions.label]!)
}
func resetMultiArray(_ predictionWindow: MLMultiArray, with value: Double = 0.0) throws {
let pointer = try UnsafeMutableBufferPointer<Double>(predictionWindow)
pointer.initialize(repeating: value)
}
I suspect the issue happening while converting VNRecognizedPointsObservation to MLMultiArray Please help me, i am trying to achieve this so hard. Thanks in advance.
Are you running your app on a simulator? Because I had the same issue that the model predicted wrong results when I ran my image classifier app on a iPhone 12 simulator. But the issue was solved when I tried to run the app on a real device. So maybe there is nothing wrong with your model or code, try running it on a real device and see if you get your intended results.

Convert to RGBA from "CVPixelBuffer" of ARKit

I have a case where I have to convert the ARFrames(CV420YpCbCr8BiPlanarFullRange) into CVPixelBuffer of format CV32BGRA.
I figured out the format of my ARFrame and found the code in GitHub and stackoverflow to find the function which does conversion and how to proceed with it. But I have been struggling alot to find the end result that will be in the required format.
The calling function in ViewController
guard currentBuffer == nil,
case .normal = frame.camera.trackingState else {
return
}
if let currentFrame = sceneView.session.currentFrame{
let imageSampler = createImageSampler(from: currentFrame)
}
}
private func createImageSampler(from frame: ARFrame) -> ImageSampler? {
do {
return try ImageSampler(frame: frame)
} catch {
print("Error: Could not initialize image sampler \(error)")
return nil
}
}
The main function which converts.
Code Link
I'm in the beginner level in swift would love to learn how to do it.

HERE iOS SDK: Places API make*Request always returns nil

I am new to the Here iOS SDK and I am trying to use the places API by searching for places around a location. I did a pod try HEREMapsStarter and tried the following code:
let places = NMAPlaces()
let location = NMAGeoCoordinates(latitude: yyy, longitude: xxx)
let result2 = places.makeSearchRequest(location: location, query: "restaurant")
let result = places.makeHereRequest(location: location, filters: nil)
result?.start(listener: self)
result2?.start(listener: self)
But this doesn't work because both result and result2 are nil. What am I missing here?
It seems that you are trying to create the places object:
let places = NMAPlaces()
But places object is Singleton and can be retrieved only by calling shared()
let places = NMAPlaces.shared()
Also as you are using result?.start(listener: self) method you need to implement the NMAResultListener listener. Example of simple listener:
class MainViewController: UIViewController, NMAResultListener {
func requestDidComplete(_ request: NMARequest, data: Any?, error inError: Error?) {
print("data = \(String(describing: data))")
guard inError == nil else {
print ("Request error \((inError! as NSError).code)")
return
}
guard data is NMADiscoveryPage, let resultPage = data as? NMADiscoveryPage else {
print ("invalid type returned \(String(describing: data))")
return
}
let resultsArray: [NMALink] = resultPage.discoveryResults;
for link in resultsArray
{
if let placeLink = link as? NMAPlaceLink {
print("PlaceLink position: \(placeLink.position.latitude), \(placeLink.position.longitude)")
}
}
}
....
}
====================================================================
Let's assume you are searching in UK, London(51.514545,-0.131666) and use NMAResultListener as described above. In both requests the listener parameter self implements NMAResultListener as described above.
The code for makeSearchRequest might be next:
let geoCoordCenter = NMAGeoCoordinates(latitude:51.514545, longitude: -0.131666)
let searchRequest = NMAPlaces.shared().makeSearchRequest(location: geoCoordCenter, query: "restaurant")
searchRequest?.start(listener: self)
When request is finished the makeSearchRequest will return results:
data = Optional(<NMADiscoveryPage: 0x28241a400>)
PlaceLink position: 51.5117, -0.12565
PlaceLink position: 51.51312, -0.13374
....
PlaceLink position: 51.51371, -0.13155
PlaceLink position: 51.51462, -0.12651
And code for makeHereRequest:
let geoCoordCenter = NMAGeoCoordinates(latitude:51.514545, longitude: -0.131666)
let hereRequest = NMAPlaces.shared().makeHereRequest(location: geoCoordCenter, filters: nil)
hereRequest?.start(listener: self)
makeHereRequest will return results:
data = Optional(<NMADiscoveryPage: 0x282400f00>)
PlaceLink position: 51.514542, -0.131883
PlaceLink position: 51.514542, -0.131883
....
PlaceLink position: 51.51435, -0.13169
PlaceLink position: 51.51444, -0.13194
PlaceLink position: 51.51444, -0.13194
Also note that depending on network conditions and the location of search there also might be error result like not found or other errors.

Converting a Vision VNTextObservation to a String

I'm looking through the Apple's Vision API documentation and I see a couple of classes that relate to text detection in UIImages:
1) class VNDetectTextRectanglesRequest
2) class VNTextObservation
It looks like they can detect characters, but I don't see a means to do anything with the characters. Once you've got characters detected, how would you go about turning them into something that can be interpreted by NSLinguisticTagger?
Here's a post that is a brief overview of Vision.
Thank you for reading.
This is how to do it ...
//
// ViewController.swift
//
import UIKit
import Vision
import CoreML
class ViewController: UIViewController {
//HOLDS OUR INPUT
var inputImage:CIImage?
//RESULT FROM OVERALL RECOGNITION
var recognizedWords:[String] = [String]()
//RESULT FROM RECOGNITION
var recognizedRegion:String = String()
//OCR-REQUEST
lazy var ocrRequest: VNCoreMLRequest = {
do {
//THIS MODEL IS TRAINED BY ME FOR FONT "Inconsolata" (Numbers 0...9 and UpperCase Characters A..Z)
let model = try VNCoreMLModel(for:OCR().model)
return VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
} catch {
fatalError("cannot load model")
}
}()
//OCR-HANDLER
func handleClassification(request: VNRequest, error: Error?)
{
guard let observations = request.results as? [VNClassificationObservation]
else {fatalError("unexpected result") }
guard let best = observations.first
else { fatalError("cant get best result")}
self.recognizedRegion = self.recognizedRegion.appending(best.identifier)
}
//TEXT-DETECTION-REQUEST
lazy var textDetectionRequest: VNDetectTextRectanglesRequest = {
return VNDetectTextRectanglesRequest(completionHandler: self.handleDetection)
}()
//TEXT-DETECTION-HANDLER
func handleDetection(request:VNRequest, error: Error?)
{
guard let observations = request.results as? [VNTextObservation]
else {fatalError("unexpected result") }
// EMPTY THE RESULTS
self.recognizedWords = [String]()
//NEEDED BECAUSE OF DIFFERENT SCALES
let transform = CGAffineTransform.identity.scaledBy(x: (self.inputImage?.extent.size.width)!, y: (self.inputImage?.extent.size.height)!)
//A REGION IS LIKE A "WORD"
for region:VNTextObservation in observations
{
guard let boxesIn = region.characterBoxes else {
continue
}
//EMPTY THE RESULT FOR REGION
self.recognizedRegion = ""
//A "BOX" IS THE POSITION IN THE ORIGINAL IMAGE (SCALED FROM 0... 1.0)
for box in boxesIn
{
//SCALE THE BOUNDING BOX TO PIXELS
let realBoundingBox = box.boundingBox.applying(transform)
//TO BE SURE
guard (inputImage?.extent.contains(realBoundingBox))!
else { print("invalid detected rectangle"); return}
//SCALE THE POINTS TO PIXELS
let topleft = box.topLeft.applying(transform)
let topright = box.topRight.applying(transform)
let bottomleft = box.bottomLeft.applying(transform)
let bottomright = box.bottomRight.applying(transform)
//LET'S CROP AND RECTIFY
let charImage = inputImage?
.cropped(to: realBoundingBox)
.applyingFilter("CIPerspectiveCorrection", parameters: [
"inputTopLeft" : CIVector(cgPoint: topleft),
"inputTopRight" : CIVector(cgPoint: topright),
"inputBottomLeft" : CIVector(cgPoint: bottomleft),
"inputBottomRight" : CIVector(cgPoint: bottomright)
])
//PREPARE THE HANDLER
let handler = VNImageRequestHandler(ciImage: charImage!, options: [:])
//SOME OPTIONS (TO PLAY WITH..)
self.ocrRequest.imageCropAndScaleOption = VNImageCropAndScaleOption.scaleFill
//FEED THE CHAR-IMAGE TO OUR OCR-REQUEST - NO NEED TO SCALE IT - VISION WILL DO IT FOR US !!
do {
try handler.perform([self.ocrRequest])
} catch { print("Error")}
}
//APPEND RECOGNIZED CHARS FOR THAT REGION
self.recognizedWords.append(recognizedRegion)
}
//THATS WHAT WE WANT - PRINT WORDS TO CONSOLE
DispatchQueue.main.async {
self.PrintWords(words: self.recognizedWords)
}
}
func PrintWords(words:[String])
{
// VOILA'
print(recognizedWords)
}
func doOCR(ciImage:CIImage)
{
//PREPARE THE HANDLER
let handler = VNImageRequestHandler(ciImage: ciImage, options:[:])
//WE NEED A BOX FOR EACH DETECTED CHARACTER
self.textDetectionRequest.reportCharacterBoxes = true
self.textDetectionRequest.preferBackgroundProcessing = false
//FEED IT TO THE QUEUE FOR TEXT-DETECTION
DispatchQueue.global(qos: .userInteractive).async {
do {
try handler.perform([self.textDetectionRequest])
} catch {
print ("Error")
}
}
}
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view, typically from a nib.
//LETS LOAD AN IMAGE FROM RESOURCE
let loadedImage:UIImage = UIImage(named: "Sample1.png")! //TRY Sample2, Sample3 too
//WE NEED A CIIMAGE - NOT NEEDED TO SCALE
inputImage = CIImage(image:loadedImage)!
//LET'S DO IT
self.doOCR(ciImage: inputImage!)
}
override func didReceiveMemoryWarning() {
super.didReceiveMemoryWarning()
// Dispose of any resources that can be recreated.
}
}
You'll find the complete project here included is the trained model !
SwiftOCR
I just got SwiftOCR to work with small sets of text.
https://github.com/garnele007/SwiftOCR
uses
https://github.com/Swift-AI/Swift-AI
which uses NeuralNet-MNIST model for text recognition.
TODO : VNTextObservation > SwiftOCR
Will post example of it using VNTextObservation once I have it one connected to the other.
OpenCV + Tesseract OCR
I tried to use OpenCV + Tesseract but got compile errors then found SwiftOCR.
SEE ALSO : Google Vision iOS
Note Google Vision Text Recognition - Android sdk has text detection but also has iOS cocoapod. So keep an eye on it as should add text recognition to the iOS eventually.
https://developers.google.com/vision/text-overview
//Correction: just tried it but only Android version of the sdk supports text detection.
https://developers.google.com/vision/text-overview
If you subscribe to releases:
https://libraries.io/cocoapods/GoogleMobileVision
Click SUBSCRIBE TO RELEASES
you can see when TextDetection is added to the iOS part of the Cocoapod
Apple finally updated Vision to do OCR. Open a playground and dump a couple of test images in the Resources folder. In my case, I called them "demoDocument.jpg" and "demoLicensePlate.jpg".
The new class is called VNRecognizeTextRequest. Dump this in a playground and give it a whirl:
import Vision
enum DemoImage: String {
case document = "demoDocument"
case licensePlate = "demoLicensePlate"
}
class OCRReader {
func performOCR(on url: URL?, recognitionLevel: VNRequestTextRecognitionLevel) {
guard let url = url else { return }
let requestHandler = VNImageRequestHandler(url: url, options: [:])
let request = VNRecognizeTextRequest { (request, error) in
if let error = error {
print(error)
return
}
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
for currentObservation in observations {
let topCandidate = currentObservation.topCandidates(1)
if let recognizedText = topCandidate.first {
print(recognizedText.string)
}
}
}
request.recognitionLevel = recognitionLevel
try? requestHandler.perform([request])
}
}
func url(for image: DemoImage) -> URL? {
return Bundle.main.url(forResource: image.rawValue, withExtension: "jpg")
}
let ocrReader = OCRReader()
ocrReader.performOCR(on: url(for: .document), recognitionLevel: .fast)
There's an in-depth discussion of this from WWDC19
Adding my own progress on this, if anyone have a better solution:
I've successfully drawn the region box and character boxes on screen. The vision API of Apple is actually very performant. You have to transform each frame of your video to an image and feed it to the recogniser. It's much more accurate than feeding directly the pixel buffer from the camera.
if #available(iOS 11.0, *) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {return}
var requestOptions:[VNImageOption : Any] = [:]
if let camData = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
requestOptions = [.cameraIntrinsics:camData]
}
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
orientation: 6,
options: requestOptions)
let request = VNDetectTextRectanglesRequest(completionHandler: { (request, _) in
guard let observations = request.results else {print("no result"); return}
let result = observations.map({$0 as? VNTextObservation})
DispatchQueue.main.async {
self.previewLayer.sublayers?.removeSubrange(1...)
for region in result {
guard let rg = region else {continue}
self.drawRegionBox(box: rg)
if let boxes = region?.characterBoxes {
for characterBox in boxes {
self.drawTextBox(box: characterBox)
}
}
}
}
})
request.reportCharacterBoxes = true
try? imageRequestHandler.perform([request])
}
}
Now I'm trying to actually reconize the text. Apple doesn't provide any built in OCR model. And I want to use CoreML to do that, so I'm trying to convert a Tesseract trained data model to CoreML.
You can find Tesseract models here: https://github.com/tesseract-ocr/tessdata and I think the next step is to write a coremltools converter that support those type of input and output a .coreML file.
Or, you can link to TesseractiOS directly and try to feed it with your region boxes and character boxes you get from the Vision API.
Thanks to a GitHub user, you can test an example: https://gist.github.com/Koze/e59fa3098388265e578dee6b3ce89dd8
- (void)detectWithImageURL:(NSURL *)URL
{
VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithURL:URL options:#{}];
VNDetectTextRectanglesRequest *request = [[VNDetectTextRectanglesRequest alloc] initWithCompletionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) {
if (error) {
NSLog(#"%#", error);
}
else {
for (VNTextObservation *textObservation in request.results) {
// NSLog(#"%#", textObservation);
// NSLog(#"%#", textObservation.characterBoxes);
NSLog(#"%#", NSStringFromCGRect(textObservation.boundingBox));
for (VNRectangleObservation *rectangleObservation in textObservation.characterBoxes) {
NSLog(#" |-%#", NSStringFromCGRect(rectangleObservation.boundingBox));
}
}
}
}];
request.reportCharacterBoxes = YES;
NSError *error;
[handler performRequests:#[request] error:&error];
if (error) {
NSLog(#"%#", error);
}
}
The thing is, the result is an array of bounding boxes for each detected character. From what I gathered from Vision's session, I think you are supposed to use CoreML to detect the actual chars.
Recommended WWDC 2017 talk: Vision Framework: Building on Core ML (haven't finished watching it either), have a look at 25:50 for a similar example called MNISTVision
Here's another nifty app demonstrating the use of Keras (Tensorflow) for the training of a MNIST model for handwriting recognition using CoreML: Github
I'm using Google's Tesseract OCR engine to convert the images into actual strings. You'll have to add it to your Xcode project using cocoapods. Although Tesseract will perform OCR even if you simply feed the image containing texts to it, the way to make it perform better/faster is to use the detected text rectangles to feed pieces of the image that actually contain text, which is where Apple's Vision Framework comes in handy.
Here's a link to the engine:
Tesseract OCR
And here's a link to the current stage of my project that has text detection + OCR already implemented:
Out Loud - Camera to Speech
Hope these can be of some use. Good luck!
For those still looking for a solution I wrote a quick library to do this. It uses both the Vision API and Tesseract and can be used to achieve the task the question describes with one single method:
func sliceaAndOCR(image: UIImage, charWhitelist: String, charBlackList: String = "", completion: #escaping ((_: String, _: UIImage) -> Void))
This method will look for text in your image, return the string found and a slice of the original image showing where the text was found
Firebase ML Kit does it for iOS (and Android) with their on-device Vision API and it outperforms Tesseract and SwiftOCR.

Can't unwrap/cast in in Swift 3

I'm using the ios library Charts (https://github.com/danielgindi/Charts)
There's a delegate function declared chartValueSelected, which gives me back entry: ChartDataEntry.
So I get the entry, with it's data, declared as var data: AnyObject?
print(entry.data)
> Optional(Optional(<MyProject.Stop: 0x6000002c5cc0> (entity: Stop; id: 0xd00000000c980010 <x-coredata://5F54CCEC-11FB-42F1-BDFE-30F7F7E18614/Stop/p806> ; data: { ... })))
print(type(of: entry.data))
> Optional<AnyObject>
That's wierd? I assigned it a non optional? Well, that might be a bug in the library, but I should atleast be able to access it?
guard let e = stopEntry as? Stop else {
continue
}
yVal.append(BarChartDataEntry(value: e.duration!.doubleValue, xIndex: i, data: e))
Well, we're fine with the ?? double optional.. But what? Why cant we unwrap it?
if let hasObject = entry.data {
print("We have object: \(type(of: hasObject))")
> We have object: _SwiftValue
if let stopObject = hasObject as? Stop {
print("We have a stop object!!") // Doesnt come here
}
}
More things that doesnt work are:
if let s = entry.data, let b = s as? Stop {
// Not executed here either
}

Resources