Open Source way to real-time image processing OCR application? [closed]

Open Source way to real-time image processing OCR application? [closed] - image-processing

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I have an application in mind that I want to produce. We have wall-mounted schedule boards that are divided into small rectangles using black lines on a white background. Magnetic name tags are placed into a particular partition to indicate this person is to work in that cell. This system works very well for communication among people, but I would like an automatic way of saving this schedule information into a database automatically.
I am envisioning a system where a camera is set in a fix position focusing on the schedule board. Periodically the camera will take a picture of the board. I want to write some code to decipher which name tags are in which area. This would require some OCR or symbol recognition. There are big numbers on each name tag that I will use to identify the person whose name tag it is.
I naturally go to Python when tackling a new programming problem. I found this post -> python image recognition which looks like a good place to start (with PIL and numpy).
Do you know a good way to do this?
Update: I have tried SimpleCV and it seems good for now.

This is actually a pretty hard problem, even though it looks quite simple. But you can make it a lot easier by doing some stuff to your image to make this manageable. I have the following suggestions:
Try to make it so that your camera is looking straight at the board with a reasonable lens so that there is minimal distortion of the image on the edges, and no perspective distortion.
Given that you'll be shooting the occasional image for analysis I think performance is in no way an issue, so shoot high-resolution images, with a flash or with a long exposure time (because everything you're shooting is stationary) to get the best possible picture quality.
If the number of different tags you expect is not too large you might find it easier to just try to match reference images of these tags in your image through template matching rather than going for full OCR of numbers. This is a lot easier to get working if your image is good enough. The python opencv interface is very complete.
High Performance Mark has a good comment to your question about including barcodes on the tags. I would add the option of QR codes, but that is just the same thing. Both are easy to detect and there are good libraries to help you read them.
If you decide you do need OCR, you should look into available OCR packages and not try to roll your own. Try pytesser for the tesseract engine or the OCRopus python interface.

Since you mentioned that you would like to use Python for this problem, perhaps you could take a look at SimpleCV. It will provides you an easy way to grab the image from the camera and do basic image processing.

I strongly agree with jilles de witt that OCR would be an extremely hard image analysis task to develop from scratch. Code reading would be a better option, but that also will be difficult to program and will require sophisticated or somewhat challenging imaging as others have noted. However, for this app you really do not need to implement OCR or formal bar codes, QR or other 2d codes.
Since your application is constrained to a limited number of targets, perhaps you could make your own simple code. For example, you could place 0 to 4 big dots in a 2x2 array after each person's name. This simple example code uniquely identifies 16 unique tags, and the features will be much easier to image, extract and decode than formal codes. Add a locator line if the code position is not consistent.

Related

Recognize Logo in a full image

First, you need to know that I'm a beginner in this subject. Initially, I'm an Embedded System Developpers but I never worked with image recognition.
Let me expose my main goal:
I would like to create my own database of Logos and be able to
recognize them in a larger image. Typical application would be, for
example, to make a database of pepsi logos and coca-cola logos and
when I take a photo of a bottle of Soda, it tells me if it one of
them or an another.
So, here is my problem:
I first wanted to use the Auto ML Kit of Google. I gave him my
databases so it could train itself on it. My first attempt was to
take photos of bottle entirely and then compare. It was ok but not
too efficient. I then tried to give him only logos but after
training, it couldnt recognize anything in the whole image of a
bottle.
I think I didn't give enough images in the first case. But I'd prefer to use the second case (by giving only logo) so that the machine would search something similar in the image.
Finally, my questions:
If you've worked with ML Kit from Google, were you able to train a
model by giving images that should be recognized in a larger image?
If yes, do you have any hints to give me?
Do you know reliable software that could help me to perform tests of this kind? I thought about Azure Machine Learning Studio from
Microsoft (since I develop on Visual Studio).
In a first time, I'd like to code as few as I can just for testing. Maybe later I could try to code my own Machine Learning System but I think it's a big challenge.
I also thought that I would need to split my image in smaller image and then send each of this images into the Machine but it would be time consuming and I need a fast reaction (like < 2 seconds).
Thanks in advance for your answer. I don't need complete answer with full tutorial (Stack Overflow is not intended for that anyway ^^) but just some advices would already be good.
Have a good day!

Azure’s Custom Vision is great for this: https://www.customvision.ai
Let’s say you want to detect a pepsi logo. Upload 70 images of products with the logo on them. Use Custom Vision to draw a box around the logo for each photo. Click “train”, and you get a tensorflow model with code.
Look up any tutorial for it, it’s pretty incredible and really easy to use.

iOs Image Recognition, Categorizing and matching pattern

I have been investigation a little about image recognition, But Haven't found something useful for me yet.
For my Wife who Is a Dentist that has to make his Tesis, I need to make an App that recognize all teeth Shape from a picture taken at standard conditions.
I need to find the best match based on teeth pattern predefined to categorize and see which match best. I know this is a big issue and not a simple solution.
Does someone know an image recognition software that makes me able to give it a a number of patterns, and then have an image and see wich pattern fits the best? Or maybe just some orientation to start searching and working on solving this problem.
Thanks!

OpenCV would be the way to go here but let me give you the facts before you start ripping your hair out.
I don't know your development experience but although OpenCV has an iOS wrapper you will be working with low-level, C libraries. If that makes you uncomfortable then turn back now. Furthermore, you will be writing the majority of the recognition/detection algorithms yourself and it takes a lot of time and patience to get these to the point where they work to an extent. Additionally, don't expect the end product to be all that reliable, professional image recognition/manipulation tools take years of development by teams of experts in computer vision. No disrespect but something that has been hacked together over a few weeks by one person will be sub-par and lacking.
Nonetheless if you want to go ahead, you can download OpenCV for iOS here:
http://docs.opencv.org/2.4/doc/tutorials/introduction/ios_install/ios_install.html

Human annotation tool for corpora in NLP [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
I am trying to build my own training corpus for Named Entity Recognition, but I don't know if there is already an existing tool for this or if I have to implement one myself.
Basically, what I need to do is take a corpus and manually tag it word by word, which is pretty tedious, but it has to be done.
Can anyone tell me if there is already an existing one and where to get it?

I had a good experience working with BRAT.
GATE is also a very complex tool for annotating, steeper learning curve.

We had a nice experience using DataTurks . They provide nice intuitive UI which allows to add collaborator, insights into data, leaderboard for annotators and some other funky features.
https://dataturks.com

For online annotation of text or HTML corpus of relatively short documents I also recommend BRAT. You will have to go under the hood of the python web application if you want to do anything custom. It also failed to work for me on large HTML documents (100 or so pages).
I have also used stand-alone apps:
Protege + Knowtator: a bit cumbersome to setup / use, but it
works;
Gate: also cumbersome, and it somewhat works. Backup
your annotations at regular intervals as you might get
surprised by a stacktrace that also wiped or corrupted your annotated
corpus (which is just serialized Java objects).
If you are dealing with PDF documents, we built a web-based PDF Annotation Tool: NOTA. It accepts anything printed to PDF, including scans. We do commercial OCR on our end to recover text from images. There is a REST API to create color-coded annotation schemas and pre-populate documents with annotations, as well as a REST API for exporting formatted text and annotation offsets. There is also a JS API you can use to customize any annotation workflows, add metadata to annotations, etc. Relationships are not supported out of the box. Large documents, 200+ pages are supported. Email us and we can give you an API key to try it out. Details and documentation links can be found here. It is free for small research projects.
Here is a screenshot of what the annotations looks like :

I co-develop myself the web-based text annotation tool: tagtog.net
There is nothing to install, and you can define the type of entities you want to annotate. Additionally you can annotation relationships, document labels, and much more. You can upload your documents in many different formats, including PDF or markdown. You can annotate together with your team collaboratively. We have put great care in making the interface easy and beautiful. It looks like this:
You can start right away with a free account. Also I would be happy to help you with any doubt or issue you may have; just ping me or write us an email to the address shown on the website, tagtog.net.

Our annotation tool Prodigy is very scriptable, and is designed for active learning. It integrates especially well with our NLP library spaCy.
We've paid particular attention to the Named Entity Recogntion (NER) annotation workflows, as entity recognition can otherwise be very slow. I have a tutorial video on this:
https://www.youtube.com/watch?v=l4scwf8KeIA

There is this tool called, Dataturks is super simple to use, fully online NLP annotation tool, so that I even can easily push my teammates to complete datasets for our projects.

try TagEditor ,
It is a desktop application designed to annotate text for training with spaCy library.
You can tag Named Entities, Dependencies, Parts of speech, text categories
and print json file.
Example

Applying various effects to images

I know how to apply two effects to images -- blurring and making them grayscale. However, I would like to expand my knowledge further and learn more things of this nature.
I decided to Google them but found out that I do not even know what they are called.
I would like to ask: How do I progress further into image processing?

Image processing is a very big area with many applications.
These applications go from medical imaging, data compression and many
commercial applications like the ones you find in photoshop.
Without knowing where you are going to apply image processing, I assume
that you want to learn for the sake of curiosity :).
Today we have lots of online courses that make learning more easy.
I did an image processing course by Guilhermo Sapiro on the coursera
website that helped a lot https://www.coursera.org/course/images .
The course has already ended but the video classes are also available
on youtube. http://www.youtube.com/watch?v=GWCB3pKi2ko ( One about histogram equalization
you can see others on the related videos)
Another source is the amazing book by Rafael gonzales calle Digital Image Processing.

If you're looking for a website solution this is a good guide to how to use the css filter effect: http://www.html5rocks.com/en/tutorials/filters/understanding-css/
If you're looking for something else, I think more detail on your application is needed.

iOS / C: Algorithm to detect phonemes

I am searching for an algorithm to determine whether realtime audio input matches one of 144 given (and comfortably distinct) phoneme-pairs.
Preferably the lowest level that does the job.
I'm developing radical / experimental musical training software for iPhone / iPad.
My musical system comprises 12 consonant phonemes and 12 vowel phonemes, demonstrated here. That makes 144 possible phoneme pairs. The student has to sing the correct phoneme pair 'laa duu bee' etc in response to visual stimulus.
I have done a lot of research into this, it looks like my best bet may be to use one of the iOS Sphinx wrappers ( iPhone App › Add voice recognition? is the best source of information I have found ). However, I can't see how I would adapt such a package, can anyone with experience using one of these technologies give a basic rundown of the steps that would be required?
Would training be necessary by the user? I would have thought not, as it is such an elementary task, compared with full language models of thousands of words and far greater and more subtle phoneme base. However, it would be acceptable (not ideal) to have the user train 12 phoneme pairs: { consonant1+vowel1, consonant2+vowel2, ..., consonant12+vowel12 }. The full 144 would be too burdensome.
Is there a simpler approach? I feel like using a fully featured continuous speech recogniser is using a sledgehammer to crack a nut. It would be far more elegant to use the minimum technology that would solve the problem.
So really I'm hunting for any open source software that recognises phonemes.
PS I need a solution which runs pretty much real-time. so even as they are singing the note, firstly it blinks on to illustrate that it picked up the phoneme pair that was sung, and then it glows to illustrate whether they are singing the correct note pitch

If you are looking for a phone-level open source recogniser, then I would recommend HTK. Very good documentation is available with this tool in the form of the HTK Book. It also contains an entire chapter dedicated to building a phone level real-time speech recogniser. From your problem statement above, it seems to me like you might be able to re-work that example into your own solution. Possible pitfalls:
Since you want to do a phone level recogniser, the data needed to train the phone models would be very high. Also, your training database should be balanced in terms of distribution of the phones.
Building a speaker-independent system would require data from more than one speaker. And lots of that too.
Since this is open-source, you should also check into the licensing info for any additional details about shipping the code. A good alternative would be to use the on-phone recorder and then have the recorded waveform sent over a data channel to a server for the recognition, pretty much something like what google does.

I have a little bit of experience with this type of signal processing, and I would say that this is probably not the type of finite question that can be answered definitively.
One thing worth noting is that although you may restrict the phonemes you are interested in, the possibility space remains the same (i.e. infinite-ish). User training might help the algorithms along a bit, but useful training takes quite a bit of time and it seems you are averse to too much of that.
Using Sphinx is probably a great start on this problem. I haven't gotten very far in the library myself, but my guess is that you'll be working with its source code yourself to get exactly what you want. (Hooray for open source!)
...using a sledgehammer to crack a nut.
I wouldn't label your problem a nut, I'd say it's more like a beast. It may be a different beast than natural language speech recognition, but it is still a beast.
All the best with your problem solving.

Not sure if this would help: check out OpenEars' LanguageModelGenerator. OpenEars uses Sphinx and other libraries.

http://www.hfink.eu/matchbox
This page links to both YouTube video demo and github source.
I'm guessing it would still be a lot of work to mould it into the shape I'm after, but is also definitely does do a lot of the work.

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart