Is there a OCR API that could be used for recognizing and counting objects from image? Or can this be done with another image processing image processing technique?
For example if i take a close-up photo of three boxes, API would just return number 3 as a result.
You can look into OpenCV, which is popular for programmers learning about image processing and vision. You'll find an endless number of posts here on StackOverflow about OpenCV.
http://opencv.org/
Some freeware GUIs and free starter versions of commercial image processing packages will allow you to test image processing techniques without having to write the code. ImageJ is old but still worth checking out:
http://rsbweb.nih.gov/ij/
I don't want to show favoritism towards any of my sisters and brothers in the image processing world, but if you google for "machine vision free" or "computer vision free" and add words such as "GUI" you should be able to quickly find some free software that will allow you to test different image processing techniques just by using your mouse.
Along with your OCR algorithm, you'll need a segmentation method to count objects.
One such technique is the connected components algorithm:
http://en.wikipedia.org/wiki/Connected-component_labeling
The typical connect components algorithm would rely on some preprocessing:
Find a binarization threshold.
Apply the binarization threshold to generate an image of black (0) and white (1) values.
Run the connect components algorithm and label all components (objects)
Filter the results by size and other parameters. For example, you probably don't want to include foreground objects that are only a few pixels in size.
Check the size of the list of filtered components.
This is a simple, low-level method, but it's useful in many situations. Even if you think you need a more complicated technique, I would strongly recommend that you first become familiar with connected components before moving on. Until one grasps the subtleties of lighting, binarization, and component labeling, it's unlikely one can learn much useful about more complicated algorithms. There really are no shortcuts.
There are other,more complicated methods, but before suggesting which might be appropriate you would have to be more specific about what kind of objects you want to find.
With any image processing question, always include one or more sample images. It's generally not useful to talk about image processing algorithms without first understanding the image set with which you are working. What may be obvious to you will not be obvious to others, especially those who have spent years working on OCR applications and who have had to deal with a wide variety of backgrounds, scripts, and specifications.
Related
I want to know, how I should process hand drawn images of circuit diagrams, for the sake of digitizing the drawn circuit and eventually simulating it.
The input to my program would be a regular picture (smartphone etc..) and the finished output should be a simulation of all possible values in the circuit (not covered / required here)
Basically all I would need to be able to detect are electrical components with a fixed number of connections (2 connections e.g. R,L,C,Diode) and the lines connecting them.
I already have a pretrained neural network for detecting which type of component it is. The part where I struggle is, how do I get bounding boxes around the components so I can classify them with my NN? I tried several approaches using Contouring and Object detection using OpenCV (eg. FindContours, ConnectedComponentsWithStats) but I cant seem to get it to detect only the components, and not the text or connecting lines between components.
Basically what I want is the following:
Given this Input Image (not hand-drawn for sake of readability)
I would like to know:
How many components are there?
Where are the bounding boxes of the components?
Basically These bounding boxes
This is used to extract components and classify them with the model I already have.
Furthermore, I would like to extract the Text closest to any component, so that I can read the values of each component. I have already managed to do OCR with the help of tesseract-ocr, so if I can get bounding boxes around the text I can easily read the values.
Like this
But the part I struggle with the most, is finding out which component is connected to which other component, I am unsure how I should approach this. Its really hard to find something googling my problem, and not certain how I should describe this problem in general. But overall, I need enough information to be able to simulate the circuit with the Matrix-Simulations (basic DC-Analysis).
I do not explicitly require code, I need general guidance to solving this problem. Or maybe even links to research papers attacking similar problems.
"Every problem is just a good dataset away from being demolished" source.
There are several electronic symbols that you will need to detect. A modern approach to classifying the symbols would be to use a neural network. To train this model, you will need to create a dataset of hand-drawn electronic symbols. Classifying the electronic symbols will be similar to handwritten digit classification.
Before the symbols can be classified by a neural network model, the image will have to be segmented. Individual components (diode, capacitor, resistor, etc...) will need to be identified, and labeled with bounding boxes.
The complexity of this task depends on the quality of your source images. Images that are created using a scanner (instead of a camera) will be much easier to work with.
This task can be accomplished with OpenCV and Python. This is a breakdown of the sub-tasks:
Mobile Document Scanning
Component segmentation
Component classification
OCR
can someone tell me how i can detect pictures of architecture or sculpture?
I think hough-transforming is a good approach. But i'm new in CV and maybe there a better methods to detect pattern. I heard about haarcascade. can i take this for architecture,too?
For example i want to detect those kind of pictures:
Image Hosted by ImageShack.us http://img842.imageshack.us/img842/4748/resizeimg0931.jpg
If you want an algorithm to detect them, then detecting an object from an image need a description of that object which can be understood by a machine or computer. For a sculpture or architecture, how can you have such uniform definition since they vary a lot in every sense? For example both your input images vary a lot. How can we differentiate between a house and an architecture? A lot of problems will rise in your question. Even with Hough Transforming, how you are supposed to differentiate a big house and a big architecture?
Check out this SOF : Image Processing: Algorithm Improvement for 'Coca-Cola Can' Recognition
He wants to detect coca-cola cans, and not coca-cola bottles. But if you look into it clearly, you will understand can and bottles are almost alike and it will be difficult to differentiate between them. You can find a lot of its difficulties in subsequent answers. Major problem is that, in some cases, it will be difficult for humans as well to differentiate them.
In your second image, even if you train some cascades for second image, there is a change it will detect live lions if they are present in your image, since a sculpture lion and an original lion seems almost same for a machine.
Haar cascades may not be much effective since you have to train for a lot of these kinds of images.
If you have some sample images and want to check if those things are there in your image, may be you can use SURF features etc. But you may need some sample images first to compare. For a demo of SURF, check out this SOF : OpenCV 2.4.1 - computing SURF descriptors in Python
Another option is template matching. But it is slow, and it is not scale and orientation invariant. And you need some template images for this
I think I have seen some papers relating this topic ( but i don't remember now). May be googling will get you them. I will update the answer if I get it.
I have a set of reference images (200) and a set of photos of those images (tens of thousands). I have to classify each photo in a semi-automated way. Which algorithm and open source library would you advise me to use for this task? The best thing for me would be to have a similarity measure between the photo and the reference images, so that I would show to a human operator the images ordered from the most similar to the least one, to make her work easier.
To give a little more context, the reference images are branded packages, and the photos are of the same packages, but with all kinds of noises: reflections from the flash, low light, imperfect perspective, etc. The photos are already (manually) segmented: only the package is visible.
Back in my days with image recognition (like 15 years ago) I would have probably tried to train a neural network with the reference images, but I wonder if now there are better ways to do this.
I recommend that you use Python, and use the NumPy/SciPy libraries for your numerical work. Some helpful libraries for handling images are the Mahotas library and the scikits.image library.
In addition, you will want to use scikits.learn, which is a Python wrapper for Libsvm, a very standard SVM implementation.
The hard part is choosing your descriptor. The descriptor will be the feature you compute from each image, intended to compute a similarity distance with the set of reference images. A good set of things to try would be Histogram of Oriented Gradients, SIFT features, and color histograms, and play around with various ways of binning the different parts of the image and concatenating such descriptors together.
Next, set aside some of your data for training. For these data, you have to manually label them according to the true reference image they belong to. You can feed these labels into built-in functions in scikits.learn and it can train a multiclass SVM to recognize your images.
After that, you may want to look at MPI4Py, an implementation of MPI in Python, to take advantage of multiprocessors when doing the large descriptor computation and classification of the tens of thousands of remaining images.
The task you describe is very difficult and solving it with high accuracy could easily lead to a research-level publication in the field of computer vision. I hope I've given you some starting points: searching any of the above concepts on Google will hit on useful research papers and more details about how to use the various libraries.
The best thing for me would be to have a similarity measure between the photo and the reference images, so that I would show to a human operator the images ordered from the most similar to the least one, to make her work easier.
One way people do this is with the so-called "Earth mover's distance". Briefly, one imagines each pixel in an image as a stack of rocks with height corresponding to the pixel value and defines the distance between two images as the minimal amount of work needed to transfer one arrangement of rocks into the other.
Algorithms for this are a current research topic. Here's some matlab for one: http://www.cs.huji.ac.il/~ofirpele/FastEMD/code/ . Looks like they have a java version as well. Here's a link to the original paper and C code: http://ai.stanford.edu/~rubner/emd/default.htm
Try Radpiminer (one of the most widely used data-mining platform, http://rapid-i.com) with IMMI (Image Mining Extension, http://www.burgsys.com/mumi-image-mining-community.php), AGPL licence.
It currently implements several similarity measurement methods (not only trivial pixel by pixel comparison). The similarity measures can be input for a learning algorithm (e.g. neural network, KNN, SVM, ...) and it can be trained in order to give better performance. Some information bout the methods is given in this paper:
http://splab.cz/wp-content/uploads/2012/07/artery_detection.pdf
Now-a-days Deep Learning based framworks like Torch , Tensorflow, Theano, Keras are the best open source tool/library for object classification/recognition tasks.
I have decided to spend my personal time after office hours to learn the building blocks of how images jpeg type are parsed and represented in screen. My interest is on object recognition in an image.so I want to know where to start , I know there are math involved in this.so I needed step by step on what resources in Internet specifically to look at.
Need a lot more information on what you want, but take a look at OpenCV
http://sourceforge.net/projects/opencvlibrary/
To see good examples.
I'd get Ritter's book (warning: costly!) and give it serious studying. If you just want to grab existing code and go play then perhaps you should look at libraries like OpenCV (see Lou's answer).
The ultimate goal of most image processing is to extract information about some high-level and application-dependent objects from an image available in low-level (pixel) form. The objects may be of every day interest like in robotics, cosmic ray showers or particle tracks like in physics, chromosomes like in biology, houses, roads, or differently used agricultural surfaces like in aerial photography or synthetic-aperture radar, etc.
This task of pattern recognition is usually preceded by multiple steps of image restoration and enhancement, image segmentation, or feature extraction, steps which can be described in general terms. The final description in problem-dependent terms, and even more so the eventual image reconstruction, escapes such generality, and the literature of application areas has to be consulted.
I'm new to image processing and I want to do a project in object detection. So help me by suggesting a step-by-step procedure to this project. Thanx.
Object detection is a very complex problem that includes some real hardcore math and long tuning of parameters to the computation methods involved. Your best bet is to use some freely available library for that - Google will help.
There are lot of algorithms about the theme and no one is the best of all. It's usually a mixture of them what makes the best solution to the solution.
For example, for object movement detection you could look at frame differencing and misture of gaussians.
Also, it's very dependent of your application, the environment (i.e. noise, signal quality), the processing capacity you may have available, the allowable error margin...
Besides, for it to work, most of time it's first necessary to do some kind of image processing to the input data like median filter, sobel filter, contrast enhancement and a large so on.
I think you should start reading all you can: books, google and, very important, a lot of papers about the subjects (there are many free in internet) you are interested in.
And first of all, i think it's fundamental (at least it has been for me) having a good library for testing. The one i have used/use is OpenCV. It's very complete, implement many of the actual more advanced algorithms, is very active, has a big community and it's free.
Open Computer Vision Library (OpenCV)
Have luck ;)
Take a look at AForge.NET. It's nowhere near Project Natal's levels of accuracy or usefulness, but it does give you the tools to learn the algorithms easily. It's an image processing and AI library and there are several tutorials on colored object tracking and motion detection.
Another one to look at is OpenCV from Intel. I believe it's a bit more advanced, but it's written in C.
Take a look at this. It might get you started in this complex field. The algorithm pages that it links to are interesting reading.
http://sun-valley.stanford.edu/projects/helicopters/final.html
This lecture by Jeff Hawkins, will give you an idea about the state of the art in this super-difficult field.
Seems that video disappeared... but this vid should cover similar ground.