This is a known area and OpenCV might well be involved, but still to start from the scratch.
How has something like Evernote's scannable app been developed. I mean, how does it automatically recognize a document using a camera and then extract it.
What are the UIKit frameworks involved here and what are the libraries that may have been used. Or any nice articles or blogs. How does one go about understanding this.
This tutorial is what you might be needing. Although, this tutorial is in Python but all these function are available in iOS bindings.
Here, are results you will get.
Once, you have the ROI i.e. the page, you should run OCR to detect the characters. For this you can use Tesseract and this tutorial might be helpful.
For anyone coming here now, there are better solutions now. CIDetector does precisely this. And to have it working on a live camera feed, you'd have to use it on live CIImages being generated by AVFoundation (rendered using Metal or OpenGL).
Related
Sorry I am pretty inexperienced with ARKit. I am working on an app and it will have more features later but the first step would basically be recreating the measure app that is included with iOS. I have looked at the documentation that Apple gives and most of it is for stuff like face tracking, object detection, or image tracking. I wasn't sure exactly where to start. The rest of the existing code I have now is written in SwiftUI if that matters. Thank you!
Understand that it can be quite confusing in the beginning. I would recommend to walk throught the toruial at raywenderlich.com. This toturial from Codestars on Youtube is also very good if you like to listen and watch instead of reading. Both talks go throught a lot of important parts of ARKit so I really recomend it. After that you problably have a create understanding and you clould watch Apples WWDC2019 talk What's new in ARKit 3.
Hope I understood your question correctly and please reach out if you have any questions or other concerns.
I need to find a way to implement face detection and recognition completely offline using a browser. Trained model specific to each user maybe loaded initially. We only need to recognize one face per device. What is the best way to implement this?
I tried tracking.js to implement face detection. It works. But couldn't get a solution to implement recognition. I tried face-recognition.js. But it needs a node server.
Take a look at: face-api.js it can both detect and recognize faces in realtime completely in the browser! It's made by Vincent Mühler, the same creator of face-recognition.js.
(Face-api.js Github)
Thing to note:
It's realtime, my machine gets ~50ms (using MTCNN model)
It's JavaScript but uses WebGL GPU acceleration under the hood which is why it performs so well
It can also work on mobile! (tested on my S8+)
I recommend looking at the included examples as well, these helped me a lot
I have used the package to create a working project, it was surprisingly easier than I thought and this is coming from a student that just started web development. (used it in a ReactJs App)
Just like you I was searching and trying things such as tracking.js but to be honest they didn't work well.
I have planned to detect an image in a news paper play the video relevant to it. I have seen several news paper reading AR apps include this feature. But i couldn't find how to do so. How can I do it??
I dont expect any code. But like to know what are the steps I should follow to do this. Thank you.
You need to browse through the available marker-based AR SDKs - such SDKs let you defined in advance the database of images you would like to detect and respond to, and once any of these images is detected during runtime, you get some kind of an event with data on the detected image.
Vuforia is considered a good one and it has good samples, so it is supposed to be easier to start with. You should also check out Kudan, and there are more.
Is it possible to perform OCR on image (for example, from assets) instead of live video with Anyline, microblink or other SDKs?
Tesseract is not an option due to my limited time.
I've tested it but the results are very inappropriate. I know that it can be improved with OpenCv or something but I have to keep a deadline.
EDIT:
This is an example of what the image looks like when it arrives to the OCR SDK.
I am not sure for the others, but you can use microblink SDK for reading from a single image. It is documented here.
Reading from a video stream will give much better results, but it all depends on what you are trying to do exactly. What are you trying to read?
For reading barcodes or MRZ from i.e. identity documents, it works pretty well. For raw text OCR, not quite as good but it is not really intended for that anyway.
https://github.com/garnele007/SwiftOCR
Machine learning based, Trainable on different font, chars, etc.
and free
I want to program an ios app that takes photos but I would like to filter the photo preview in real time. What I mean is implemented in the app called "CamWow" (here is a video of the app: http://www.youtube.com/watch?v=L_o-Bx08YZE ). I curious how this can be done. Has anybody an idea how to build such an app that provides a filtered real time preview of the photo and captures a filtered photo?
As Fraggle points out, on iOS 5.0, you can use the Core Image framework to do image filtering. However, Core Image is limited to the filters they ship with the framework, and I've found that it is not able to process video in realtime in many cases.
As a result, I created my BSD-licensed open source GPUImage framework, which encapsulates the OpenGL ES 2.0 code you need to do GPU-accelerated processing of images and video. I have some examples of the kind of filtering you can do with this in this answer, and you can easily write your own custom filters using the OpenGL Shading Language. The sample applications in the framework show how to do filtering of images with live previews, as well as how to filter and save them out to disk.
I'm looking for the same kind of info (its a pretty hot sector so some devs may not be willing to give up the goods just yet). I came across this, which may not be exactly what you want, but could be close. Its a step by step tutorial to process live video feed.
Edit: I've tried the code that was provided in that link. It can be used to provide filters in real time. I modified the captureOutput method in ViewController.m , commented out the second filtering step ("CIMinimumCompositing") and inserted my own filter (I used "CIColorMonochrome").
It worked. My first few attempts failed because not all filters in Core Image Filter reference are available for iOS apparently. There is some more documentation here.
Not sure if this code is the best performance wise, but it does work.
Edit #2: I saw some other answers on SOverflow that recommended using OpenGL for processing which this sample code does not do. OpenGL should be faster.