I cant seem to see anything in my searches.
I am trying to find a Rails 4 or 5 gem/plugin that acts similar to when you hold the camera to an app store card and it converts the text on the card to an object.
in my case i want a user to submit a photo and it reads two boxes in the screen shot / photo and converts the text in the image to text to save to an appropriate field and saves on from submit..
is there anything like this out there or am i just thinking a bit too far out of the box here..??
thanks.
Hope you're ready to invest a lot of time and probably money.
What you're looking for is OCR (Optical character recognition). ABBYY makes finereader which is a business solution for OCR. It's probably the best you can hope for at this point. There probably are quite a few open source solutions out there that will work more or less well.
Check out this project (or find similar ones): https://sourceforge.net/projects/tesseract-ocr/
Related
First, you need to know that I'm a beginner in this subject. Initially, I'm an Embedded System Developpers but I never worked with image recognition.
Let me expose my main goal:
I would like to create my own database of Logos and be able to
recognize them in a larger image. Typical application would be, for
example, to make a database of pepsi logos and coca-cola logos and
when I take a photo of a bottle of Soda, it tells me if it one of
them or an another.
So, here is my problem:
I first wanted to use the Auto ML Kit of Google. I gave him my
databases so it could train itself on it. My first attempt was to
take photos of bottle entirely and then compare. It was ok but not
too efficient. I then tried to give him only logos but after
training, it couldnt recognize anything in the whole image of a
bottle.
I think I didn't give enough images in the first case. But I'd prefer to use the second case (by giving only logo) so that the machine would search something similar in the image.
Finally, my questions:
If you've worked with ML Kit from Google, were you able to train a
model by giving images that should be recognized in a larger image?
If yes, do you have any hints to give me?
Do you know reliable software that could help me to perform tests of this kind? I thought about Azure Machine Learning Studio from
Microsoft (since I develop on Visual Studio).
In a first time, I'd like to code as few as I can just for testing. Maybe later I could try to code my own Machine Learning System but I think it's a big challenge.
I also thought that I would need to split my image in smaller image and then send each of this images into the Machine but it would be time consuming and I need a fast reaction (like < 2 seconds).
Thanks in advance for your answer. I don't need complete answer with full tutorial (Stack Overflow is not intended for that anyway ^^) but just some advices would already be good.
Have a good day!
Azure’s Custom Vision is great for this: https://www.customvision.ai
Let’s say you want to detect a pepsi logo. Upload 70 images of products with the logo on them. Use Custom Vision to draw a box around the logo for each photo. Click “train”, and you get a tensorflow model with code.
Look up any tutorial for it, it’s pretty incredible and really easy to use.
I'm taking a beginning mobile development class, and my professor wants me to jump right in and help him with an app of his written in Objective-C, and I have 3 months. I have taken a few other CS classes so far, but no next to nothing about mobile app development.
This app is basically a songbook that holds many PDF files of music scores. The first (of multiple things) that he wants me to add is the ability for a user to annotate the music score with highlighter, pen, and eraser. Since there are many music scores, I would need to have the app save these annotations for each score, and allow editing by the user later if needed (i.e. allow the user to go back and erase stuff and add more annotations to a given score).
I'm in the planning phase and I'm trying to figure out the best way to do this. I was thinking of having the annotations occur on a second view layer, and then saving that layer as an image so that it can be overlaid back onto the music notes sheet at any time (for the user to view). My concern is, would the user be able to re-annotate this layer once it has been saved as an image (i.e. erase and add more annotations, then save it again)?
Or what's the best way to go about this? I would really appreciate any advice because I am in over my head!
Well This is very broad question to answer it but let me help you with some links and you will need to go through that like.
It will help you to start your requirements into app.
There are many 3rd party frameworks are there for PDF annotations:
PSPDFKit (Paid)
FastPDFKit
Poppler (OpenSource)
There are some SO Questions links which also helps you for PDF annotation
Add Annotation to PDF
Annotation on an PDF
Programmatically add annotations on PDF
Some Github Links
LazyPDF
Note: LazyPDFKit - (No longer maintained - Use the source code to fix
the bugs)
Hope this will helps you in your research.
Local travel cards in Saint-Petersburg, Russia have got huge id numbers that aren't easy to read and type into a web page when topping up the card online. So I want to build a small app that would take a photo of a travel card and parse the number out.
The task is a bit easier than a free form recognition:
card is of the very well known size
id numbers are of known size, are located in the very well known location on a card and they are number only, no letters (okay, there are two variations I think and maybe they will add 1-2 more in the future)
even the font is known in advance
even the first several numbers are the same for most of the card (so far there are only two prefixes used)
How would you do it? Are there any libraries tuned not for the general OCR, but for a "hinted" OCR like I need?
Best regards,
Artem.
P.S.
Actually a free/cheap web service for this task would also be good enough
Yes Google has a library called Tesseract and there is an iOS SDK on Github you can import into your application. So you can use this SDK and it has some documentation that you can read that will explain how to set it up in your app. It has methods that will return you a string with the text of the card in the string. BUT it will be ALL of the text from the card. So best thing to do would be to:
1 "clip" the original image to extract a sub image that displays only the portion of the card you wish to get the numbers from.
2 Process this sub image through Tesseract to retrieve the string you are looking for.
3 Then parse through the string and pick out the data that you need.
But just be warned, it can be a bit quirky. This SDK tends to recognize words best from images that are scanned, not "taken a picture of". Because although it is an advance piece of technology, it isn't perfect. So to get it to work as perfectly as possible for you, try to get scanned copies of the originals.
Best of luck.
The ideal solution for you would have three components:
1) Detection of the card. This is useful because if you have the detection, then the end users have much easier time actually using the scanner, because they can place the phone above the card in an arbitrary direction
2) Accurate OCR component. Ideally, customizable for this exact font you have on the card, for the exact position on the card.
3) Parsing mechanism. This would enable you to obtain the exact string written on the card without writing huge amount of OCR parsing code.
BlinkID SDK has all this. It has a preset for detection cards in the ID-1 format. It has integrated OCR engine. And it provides RegexParser, where you can define the exact format of the text which you're trying to extract from the document.
BlinkID was initially built for scanning ID documents which have very similar properties as the problem you're trying to solve.
Note. I'm one of the developers working on BlinkID.
I was assigned a project (in school) for automated multiple choice test scoring and I do not know where to start.
I think his is a kind of popular program and you already know about it. Enter an image file scanned of the answer sheet and return results.
Everything I know about computer vision is a few examples of photo editing with OpenCV. I hope you can give me a few keywords related to the problem or maybe a couple of blog articles, documents and related libraries.
Is there any free open source programs that I can refer to?
Thanks!
Edit: Add 2 example of the answer sheet (sory that I cannot find a sheet in English):
I think there are basically two steps to the problem
bring the form into a normalized position
now you know where the boxes are and can look at them by thresholding the gray values in that region.
What methods to use for step 1 depends on your actual images and how much the vary. Do you have some example images you can upload?
Also I think it is a good idea, especially if you are a beginner, to start with some simple examples and work your way up from there by adding more and more variation.
I have a site where a user can upload a photo. I have no idea how to handle photos. To make a thumbnail out of a large res photo, do I just resize the width and the height? Or is there a better way to do this?
If you could point me to any resources or give me any tips, that would be great.
I'm using Ruby on Rails, if that matters. I don't really want gems for this because I want to learn how to do it myself.
For some of this, "learning how to do it yourself" is going to be a significant undertaking. Resizing the image, for example. Go ahead and find an open source library (such as a gem) that resizes images and look through its source code. It's not impossible to do on your own, but a lot of that sort of thing is built on the expertise that's come before, etc. There's nothing wrong with making use of a tool somebody else has created, provided that you understand what the tool is doing.
A few points to hopefully help you out:
Go with a white-list approach of which image formats you support. Don't just let users upload anything that they call an image.
Each format you support is going to have its own standards (possibly multiple) for meta-data. Stripping out that data wholesale may or may not be a good idea. For example, a jpeg may contain its orientation in its EXIF data and if you strip that out you may be effectively rotating the image. Certain fields, such as geotagging, you may want to strip out in the effort to protect your users' privacy, etc. Again, look into existing libraries for this and see how they do it.
DO NOT implicitly trust the file name extension for determining the type of the image. It's possible for a user to construct a malicious file that isn't really an image, pass it off as an image to an unsuspecting host, and effectively open a security flaw on that host as it tries to process the file as an image. There was a question about determining file type in Ruby here, and I'm sure there's a lot more to be found on the subject.
David answered the question well, but I thought I might be able to provide some more specific information regarding your question.
Use the Paperclip gem, combined with RMagick, an ImageMagick wrapper for Ruby. You can set post-processing options and create multiple resizings.
If you really want to do it yourself, checkout the actual gem at https://github.com/thoughtbot/paperclip and you'll see how that author does it. Part of the thought behind Ruby on Rails is DRTW (Don't Reinvent the Wheel). Utilize what's out there and build on it. It will save you time, and enable you to do more in the longrun.
An alternative is to use a third party service such as http://resizer.co (I am not affiliated)
Replace a URL to an image from your site, say:
http://example.com/images/abc123.jpg
with the following (for a 200 x 200 image)
http://www.resizer.co?image=http://example.com/images/abc123.jpg&w=200&h=200
You may run into issues with distortion and the 'aspect ratio' not being maintained, but it could be a good start.