I am able to get the rectangles of text detected in vision framework video feed in iOS 11, but I am trying to get the image part of video that was recognized as a text or character. Someone can help in that?
func detectTextHandler(request: VNRequest, error: Error?) {
guard let observations = request.results else {
print("no result")
return
}
let result = observations.map({$0 as? VNTextObservation})
DispatchQueue.main.async() {
self.imageView.layer.sublayers?.removeSubrange(1...)
for region in result {
guard let rg = region else {
continue
}
self.highlightWord(box: rg)
if let boxes = region?.characterBoxes {
for characterBox in boxes {
self.highlightLetters(box: characterBox)
}
}
}
}
}
So how can I get the image part of region?.characterBoxes
Related
I'm making an app that supposed to recognise only specific fruits. I trained the model for 27 fruits. I want to make my app to show that the object is not found, if it's not in the trained model.
So far I'm using this line of code to see if the confidence is less than 90%, then the object is unfamiliar.
guard classification.confidence >= 0.90 else {
print("we do not know what is it. confidence: \(classification.confidence)")
// TODO: set image with not found
return
}
But it still recognises the unfamiliar objects (for example a TV is 99% a papaya).
This is the whole code of the function that is responsible to the recognition.
func detect(image: CIImage) {
guard let model = try? VNCoreMLModel(for: BaliFruitsIdentifier().model) else {
fatalError("LoadingCreML Model Failed.")
}
let request = VNCoreMLRequest(model: model) { (request, error) in
guard let classification = request.results?.first as? VNClassificationObservation else {
fatalError("Could not classify image.")
}
self.navigationItem.title = classification.identifier.capitalized
print("confidence: \(classification.confidence)")
guard classification.confidence >= 0.90 else {
print("we do not know what is it. confidence: \(classification.confidence)")
// TODO: set image with not found
return
}
for fruit in Fruit().fruits {
if fruit.fruitName == classification.identifier {
self.imageView.image = fruit.fruitPicture
self.textView.text = fruit.fruitDescription
self.textView.attributedText = fruit.attributed
print("it is \(classification.identifier) with \(classification.confidence) confidence")
}
}
}
let handler = VNImageRequestHandler(ciImage: image)
do {
bigCameraButtonImage.isHidden = true
try handler.perform([request])
}
catch {
print(error)
}
}
I'm using Vision Framework to detecting faces with iPhone's front camera. My code looks like
func detect(_ cmSampleBuffer: CMSampleBuffer) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(cmSampleBuffer) else {return}
var requests: [VNRequest] = []
let requestLandmarks = VNDetectFaceLandmarksRequest { request, _ in
DispatchQueue.main.async {
guard let results = request.results as? [VNFaceObservation],
print(results)
}
}
requests.append(requestLandmarks)
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, orientation: .leftMirrored)
do {
try handler.perform(requests)
} catch {
print(error)
}
}
However, I noticed that when I move my face horizontally, the coordinates change vertically and vice versa. The image bellow can help to understand
If anyone can help me i'm going crazy about it
For some reason, remove
let connectionVideo = videoDataOutput.connection(with: AVMediaType.video)
connectionVideo?.videoOrientation = AVCaptureVideoOrientation.portrait
from my AVCaptureVideoDataOutput solved the problem 🤡
I'm having troubles taking RAW pictures with a zoomFactor different than 1.0.
In fact if I take a picture with the minimum zoom level everything works fine. However, if I try to zoom closer to a subject changing the zoomFactor the app crashes with the following error:
Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: '*** -[AVCapturePhotoOutput capturePhotoWithSettings:delegate:] When specifying Bayer raw capture, the videoZoomFactor of the video device must be set to 1.0'
This only happens when shooting RAW. If I shoot using the standard HEVC format everything works. I'm using Swift 4.2 and the AVFoundation Framework
Here's the code referenced by the error:
extension CameraController: AVCapturePhotoCaptureDelegate, AVCaptureVideoDataOutputSampleBufferDelegate {
public func photoOutput(_ output: AVCapturePhotoOutput, didFinishProcessingPhoto photo: AVCapturePhoto, error: Error?) {
guard error == nil else {
print("Error capturing photo: \(error!)")
return
}
// Access the file data representation of this photo.
guard let photoData = photo.fileDataRepresentation() else {
print("No photo data to write.")
return
}
print("Generating IMAGE with metadata \n", photo.metadata)
if photo.isRawPhoto {
// Generate a unique URL to write the RAW file.
rawFileURL = makeUniqueDNGFileURL()
do {
// Write the RAW (DNG) file data to a URL.
try photoData.write(to: rawFileURL!)
print("RAW-URL Generated")
createRAWImageOnAlbum(withRAWURL: rawFileURL!)
} catch {
fatalError("Couldn't write DNG file to the URL.")
}
} else {
createHEVCPhotoOnAlbum(photo: photo)
}
}
private func makeUniqueDNGFileURL() -> URL {
let tempDir = FileManager.default.temporaryDirectory
let fileName = ProcessInfo.processInfo.globallyUniqueString
return tempDir.appendingPathComponent(fileName).appendingPathExtension("dng")
}
}
Do you know the reason of this?
I'm setting the zoomFactor here:
func updateZoom(toValue: CGFloat) throws {
let session = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInWideAngleCamera], mediaType: AVMediaType.video, position: .unspecified)
guard let cameras = (session.devices.compactMap { $0 }) as? [AVCaptureDevice], !cameras.isEmpty else { throw CameraControllerError.noCamerasAvailable }
for camera in cameras {
if camera.position == .back {
self.rearCamera = camera
try camera.lockForConfiguration()
camera.ramp(toVideoZoomFactor: toValue, withRate: 4)
camera.unlockForConfiguration()
} else if camera.position == .front {
self.frontCamera = camera
try camera.lockForConfiguration()
camera.ramp(toVideoZoomFactor: toValue, withRate: 4)
camera.unlockForConfiguration()
}
}
}
Do check the supporting zoom factor of the device, wherein captureDevice is your AVCaptureDevice instance:
func checkZoom(zoomFactor: CGFloat) {
guard
zoomFactor <= captureDevice.maxAvailableVideoZoomFactor,
zoomFactor >= captureDevice.minAvailableVideoZoomFactor
else {
print("ZoomFactor not supported \(zoomFactor)")
return
}
}
I've already asked a question without any responses here:
How do I record changes on a CIImage to a video using AVAssetWriter?
But perhaps my question needs to be simpler. My Google search has been fruitless. How do I capture video of a changing CIImage in real time, without using the camera?
Using captureOutput, I get a CMSampleBuffer, which I can make into a CVPixelBuffer. AVAssetWriterInput's mediaType is set to video, but I think it expects compressed video. In addition, I'm not clear if the AVAssetWriterInput expectsMediaDataInRealTime property should be set to true or not.
Seems like it should be fairly simple, but everything I attempted makes my AVAssetWriter's status fail.
Here is my last attempt at making this work. Still failing:
#objc func importLivePreview(){
guard var importedImage = importedDryCIImage else { return }
DispatchQueue.main.async(){
// apply filter to camera image
// this is what makes the CIImage appear that it is changing
importedImage = self.applyFilterAndReturnImage(ciImage: importedImage, orientation: UIImage.Orientation.right, currentCameraRes:currentCameraRes!)
if self.videoIsRecording &&
self.assetWriterPixelBufferInput?.assetWriterInput.isReadyForMoreMediaData == true {
guard let writer: AVAssetWriter = self.assetWriter, writer.status == .writing else {
return
}
guard let cv:CVPixelBuffer = self.buffer(from: importedImage) else {
print("CVPixelBuffer could not be created.")
return
}
self.MTLContext?.render(_:importedImage, to:cv)
self.currentSampleTime = CMTimeMakeWithSeconds(0.1, preferredTimescale: 1000000000)
guard let currentSampleTime = self.currentSampleTime else {
return
}
let success = self.assetWriterPixelBufferInput?.append(cv, withPresentationTime: currentSampleTime)
if success == false {
print("Pixel Buffer input failed")
}
}
guard let MTLView = self.MTLCaptureView else {
print("MTLCaptureView is not found or nil.")
return
}
// update the MTKView with the changed CIImage so the user can see the changed image
MTLView.image = importedImage
}
}
I got it working. The problem was is that I wasn't offsetting currentSampleTime. This example doesn't have accurate offsets, but it shows that it needs to be added onto the last time.
#objc func importLivePreview(){
guard var importedImage = importedDryCIImage else { return }
DispatchQueue.main.async(){
// apply filter to camera image
// this is what makes the CIImage appear that it is changing
importedImage = self.applyFilterAndReturnImage(ciImage: importedImage, orientation: UIImage.Orientation.right, currentCameraRes:currentCameraRes!)
if self.videoIsRecording &&
self.assetWriterPixelBufferInput?.assetWriterInput.isReadyForMoreMediaData == true {
guard let writer: AVAssetWriter = self.assetWriter, writer.status == .writing else {
return
}
guard let cv:CVPixelBuffer = self.buffer(from: importedImage) else {
print("CVPixelBuffer could not be created.")
return
}
self.MTLContext?.render(_:importedImage, to:cv)
guard let currentSampleTime = self.currentSampleTime else {
return
}
// offset currentSampleTime
let sampleTimeOffset = CMTimeMakeWithSeconds(0.1, preferredTimescale: 1000000000)
self.currentSampleTime = CMTimeAdd(currentSampleTime, sampleTimeOffset)
print("currentSampleTime = \(String(describing: currentSampleTime))")
let success = self.assetWriterPixelBufferInput?.append(cv, withPresentationTime: currentSampleTime)
if success == false {
print("Pixel Buffer input failed")
}
}
guard let MTLView = self.MTLCaptureView else {
print("MTLCaptureView is not found or nil.")
return
}
// update the MTKView with the changed CIImage so the user can see the changed image
MTLView.image = importedImage
}
}
I am trying to detect bar code from user selected image. I am able to detect QR code from the image but can not find anything related to bar code scanning from Image. The code I am using to detect QR code from image is like below:
func detectQRCode(_ image: UIImage?) -> [CIFeature]? {
if let image = image, let ciImage = CIImage.init(image: image){
var options: [String: Any]
let context = CIContext()
options = [CIDetectorAccuracy: CIDetectorAccuracyHigh]
let qrDetector = CIDetector(ofType: CIDetectorTypeQRCode, context: context, options: options)
if ciImage.properties.keys.contains((kCGImagePropertyOrientation as String)){
options = [CIDetectorImageOrientation: ciImage.properties[(kCGImagePropertyOrientation as String)] ?? 1]
}else {
options = [CIDetectorImageOrientation: 1]
}
let features = qrDetector?.features(in: ciImage, options: options)
return features
}
return nil
}
When I go into the documentation of the CIDetectorTypeQRCode it says
/* Specifies a detector type for barcode detection. */
#available(iOS 8.0, *)
public let CIDetectorTypeQRCode: String
Though this is QR code Type documentation says it can detect barcode also.
Fine. But when I use the same function to decode barcode it returns me empty array of features. Even if it returns me some features how will I be able to convert it to the bar code alternative of CIQRCodeFeature ? I do not see any bar code alternative in the documentation of CIQRCodeFeature. I know with ZBar SDK you can do this, but I am trying not to use any third party library here, or is it mandatory to use this in this regard??.
Please help, Thanks a lot.
You can use Vision Framework
Barcode detection request code
var vnBarCodeDetectionRequest : VNDetectBarcodesRequest{
let request = VNDetectBarcodesRequest { (request,error) in
if let error = error as NSError? {
print("Error in detecting - \(error)")
return
}
else {
guard let observations = request.results as? [VNDetectedObjectObservation]
else {
return
}
print("Observations are \(observations)")
}
}
return request
}
Function in which you need to pass the image.
func createVisionRequest(image: UIImage)
{
guard let cgImage = image.cgImage else {
return
}
let requestHandler = VNImageRequestHandler(cgImage: cgImage, orientation: image.cgImageOrientation, options: [:])
let vnRequests = [vnBarCodeDetectionRequest]
DispatchQueue.global(qos: .background).async {
do{
try requestHandler.perform(vnRequests)
}catch let error as NSError {
print("Error in performing Image request: \(error)")
}
}
}
Reference Link