I would like to use style transfer (example) in CoreML. Since CoreML support converting Keras my first thought was to convert one of their samples like this one or this one but it seems there's few issue with this approach based on this thread.
How can I use Style Transfer in CoreML? any examples will help.
Edit:
Thanks for the link #twerdster, I was able to test it and it's working for me.
Additionally I found this (torch2coreml) repo by Prisma.
Related
I need to find a way to implement face detection and recognition completely offline using a browser. Trained model specific to each user maybe loaded initially. We only need to recognize one face per device. What is the best way to implement this?
I tried tracking.js to implement face detection. It works. But couldn't get a solution to implement recognition. I tried face-recognition.js. But it needs a node server.
Take a look at: face-api.js it can both detect and recognize faces in realtime completely in the browser! It's made by Vincent Mühler, the same creator of face-recognition.js.
(Face-api.js Github)
Thing to note:
It's realtime, my machine gets ~50ms (using MTCNN model)
It's JavaScript but uses WebGL GPU acceleration under the hood which is why it performs so well
It can also work on mobile! (tested on my S8+)
I recommend looking at the included examples as well, these helped me a lot
I have used the package to create a working project, it was surprisingly easier than I thought and this is coming from a student that just started web development. (used it in a ReactJs App)
Just like you I was searching and trying things such as tracking.js but to be honest they didn't work well.
I am researching about mask r-cnn. I want to know how to pretrain my image(knife,sofa,baby,.....) using resnet50 in mask-rcnn. I struggle to find that in github, but I can't. Please help me anybody who know how to handle it.
Try this implementation of Mask RCNN on github here.
You can follow Mask_RCNN github repo. It has both resnet50 and resnet100 (might be wrong here). It is a beautiful implementation I would say. The base model is from FAIR (Facebook AI Research). There is a demo file which you can check before starting your work.
If it works well, you can see my answer, it will help you to train the model with your custom data. The answer is a bit long, but it lists all the steps.
Something which I personally like about this implementation is:
It is easy to setup. Won't bother you much about the dependencies. Having a python virtual environment does the wonders.
It falls back automatically from a CPU version to GPU and vice versa.
It is having good support from its developers. It is getting commits frequently.
The code is very customisable. So If you want to do some changes, it's pretty easy. Some booleans and numbers changes up and down and you are done...!!!
Is it possible to perform OCR on image (for example, from assets) instead of live video with Anyline, microblink or other SDKs?
Tesseract is not an option due to my limited time.
I've tested it but the results are very inappropriate. I know that it can be improved with OpenCv or something but I have to keep a deadline.
EDIT:
This is an example of what the image looks like when it arrives to the OCR SDK.
I am not sure for the others, but you can use microblink SDK for reading from a single image. It is documented here.
Reading from a video stream will give much better results, but it all depends on what you are trying to do exactly. What are you trying to read?
For reading barcodes or MRZ from i.e. identity documents, it works pretty well. For raw text OCR, not quite as good but it is not really intended for that anyway.
https://github.com/garnele007/SwiftOCR
Machine learning based, Trainable on different font, chars, etc.
and free
This is a known area and OpenCV might well be involved, but still to start from the scratch.
How has something like Evernote's scannable app been developed. I mean, how does it automatically recognize a document using a camera and then extract it.
What are the UIKit frameworks involved here and what are the libraries that may have been used. Or any nice articles or blogs. How does one go about understanding this.
This tutorial is what you might be needing. Although, this tutorial is in Python but all these function are available in iOS bindings.
Here, are results you will get.
Once, you have the ROI i.e. the page, you should run OCR to detect the characters. For this you can use Tesseract and this tutorial might be helpful.
For anyone coming here now, there are better solutions now. CIDetector does precisely this. And to have it working on a live camera feed, you'd have to use it on live CIImages being generated by AVFoundation (rendered using Metal or OpenGL).
I downloaded the EverNote API Xcode Project but I have a question regarding the OCR feature. With their OCR service, can I take a picture and show the extracted text in a UILabel or does it not work like that?
Or is the text that is extracted not shown to me but only is for the search function of photos?
Has anyone ever had any experience with this or any ideas?
Thanks!
Yes, but it looks like it's going to be a bit of work.
When you get an EDAMResource that corresponds to an image, it has a property called recognition that returns an EDAMData object that contains the XML that defines the recognition info. For example, I attached this image to a note:
I inspected the recognition info that was attached to the corresponding EDAMResource object, and found this:
the xml i found on pastie.org, because it's too big to fit in an answer
As you can see, there's a LOT of information here. The XML is defined in the API documentation, so this would be where you parse the XML and extract the relevant information yourself. Fortunately, the structure of the XML is quite simple (you could write a parser in a few minutes). The hard part will be to figure out what parts you want to use.
It doesn't really work like that. Evernote doesn't really do "OCR" in the pure sense of turning document images into coherent paragraphs of text.
Evernote's recognition XML (which you can retrieve after via the technique that #DaveDeLong shows above) is most useful as an index to search against; the service will provide you sets of rectangles and sets of possible words/text fragments with probability scores attached. This makes a great basis for matching search terms, but a terrible one for constructing a single string that represents the document.
(I know this answer is like 4 years late, but Dave's excellent description doesn't really address this philosophical distinction that you'll run up against if you try to actually do what you were suggesting in the question.)