Detect hand-drawn circuit components in image, detect text, build connection tree - opencv

I want to know, how I should process hand drawn images of circuit diagrams, for the sake of digitizing the drawn circuit and eventually simulating it.
The input to my program would be a regular picture (smartphone etc..) and the finished output should be a simulation of all possible values in the circuit (not covered / required here)
Basically all I would need to be able to detect are electrical components with a fixed number of connections (2 connections e.g. R,L,C,Diode) and the lines connecting them.
I already have a pretrained neural network for detecting which type of component it is. The part where I struggle is, how do I get bounding boxes around the components so I can classify them with my NN? I tried several approaches using Contouring and Object detection using OpenCV (eg. FindContours, ConnectedComponentsWithStats) but I cant seem to get it to detect only the components, and not the text or connecting lines between components.
Basically what I want is the following:
Given this Input Image (not hand-drawn for sake of readability)
I would like to know:
How many components are there?
Where are the bounding boxes of the components?
Basically These bounding boxes
This is used to extract components and classify them with the model I already have.
Furthermore, I would like to extract the Text closest to any component, so that I can read the values of each component. I have already managed to do OCR with the help of tesseract-ocr, so if I can get bounding boxes around the text I can easily read the values.
Like this
But the part I struggle with the most, is finding out which component is connected to which other component, I am unsure how I should approach this. Its really hard to find something googling my problem, and not certain how I should describe this problem in general. But overall, I need enough information to be able to simulate the circuit with the Matrix-Simulations (basic DC-Analysis).
I do not explicitly require code, I need general guidance to solving this problem. Or maybe even links to research papers attacking similar problems.

"Every problem is just a good dataset away from being demolished" source.
There are several electronic symbols that you will need to detect. A modern approach to classifying the symbols would be to use a neural network. To train this model, you will need to create a dataset of hand-drawn electronic symbols. Classifying the electronic symbols will be similar to handwritten digit classification.
Before the symbols can be classified by a neural network model, the image will have to be segmented. Individual components (diode, capacitor, resistor, etc...) will need to be identified, and labeled with bounding boxes.
The complexity of this task depends on the quality of your source images. Images that are created using a scanner (instead of a camera) will be much easier to work with.
This task can be accomplished with OpenCV and Python. This is a breakdown of the sub-tasks:
Mobile Document Scanning
Component segmentation
Component classification
OCR

Related

Is deep learning the only way to detect humans in a picture?

I'm looking for a way to detect humans in a picture. For instance, regarding the picture below, I'd like to coarsely determine how many people are in the scene. I must be able to detect both standing and sitting people. I do not mind not detecting people located behind a physical object (such as the glass in the bus picture).
AFAIK, such a problem can rather easily be solved by training deep neural networks. However, my coworkers would like me to also implement a detection technique based on general image processing techniques. I've spent several days looking for techniques designed by researchers but I couldn't find anything else than saliency-based techniques (which may be fine, but I'd like to test several techniques based on old-fashioned image processing).
I'd like to mention that I'm not new to the topic of image segmentation & I used to segment aortas in medical scans. However, this task was easier IMHO since scanners have similar features: in this use-case (human detection in a bus, for instance), the pictures will have very different characteristics (e.g. image contrast can strongly vary, whether it's been taken during the day or at night).
Long story short, I'd like to know if there's some segmentation technique for human detection for which it'd be interesting giving a shot, given the fact that the images features vary a lot?
Is deep learning the only way to detect humans in a picture?
No. Is it the best way we know? Depends on your conditions.
The simplest way of detection is to generate lots of random bounding boxes and then solving the classification problem of the crop. Here is some pythonic pseudo-code:
def detect_people(image):
"""
Find all people in image.
Parameters
----------
image : image object
Returns
-------
people : list of axis-aligned bounding boxes (aabb)
Each bounding box contains a person
"""
people = []
for aabb in generate_random_aabb(image):
crop = crop_image(image, aabb)
if is_person(crop):
people.append(crop)
return people
In this case is_person can be any classifier, e.g. boosted decision stumps as used in the Viola–Jones object detection framework. Speaking of which: That would likely be the way to go without DL, but is much more complicated to explain.
Object Detection vs Segmentation
Your question mixes both. Object detection gives you bounding boxes (coarse) for instances. Semantic segmentation labels all pixels by classes, but does not distinguish different instances of the same class (e.g. different people). Instance segmentation is like object detection, but is fine-grained and aims for pixel-exact results.
If you are interested in segemantation, I can recommend my paper: A Survey of Semantic Segmentation

Best approach to build a model to recognise licence plates (ALPR)

I am trying to make a deep learning model to detect and read number plates using deep learning techniques like CNN. I would be making a model in tensorflow. But i still don't know what can be the best approach to build such model.
i have checked few models like this
https://matthewearl.github.io/2016/05/06/cnn-anpr/
i have also checked some research papers but none show the exact way.
So the steps what i am planning to follow are
Image preprocessing using opencv ( grayscale,transformations etc i dont know much about this part)
Licence plate Detection (probably by sliding window method)
Train using CNN by building a synthetic dataset as in the above link.
My questions
Is there any better way to do this?
Can RNN also be combined after CNN for variable length number?
Should i prefer detecting and recognising individual characters rather the whole plate?
There are many old methods too who prefer image preprocessing and the directly passing to OCR.What will be the best?
PS- i want to make a commercial real time system. So i need good accuracy.
Firstly, I don't think combining RNN and CNN can achieve real time system. And I personally prefer detecting individual characters if I want real time system because there will not more than 10 characters on license plate. When detecting plates with variable length, detecting individual characters can be more feasible.
Before I learned deep learning, I also have tried to use OCR to detect plate. In my case, OCR is fast but the accuracy is limited especially when the plate is not clear enough. Even image processing cannot rescue some unclear case.......
So if I were you I will try as follows:
Simple image preprocessing on the whole image
Licence plate Detection (probably by sliding window method)
Image processing (filters and geometric transformations) on the extracted plate part to make it more clear. Separate characters.
Deploy CNN to each character. (Maybe I will try some short CNNs because of real time, such as LeNet used in MNIST handwritten digit data ) (Multithreading might be needed)
Hope my response can help.

Machine Learning: sign visibility

I work at an airport where we need to determine the visibility conditions of pilots.
To do this, we have signs placed every 200 meters along the runway that allow us to determine how far the visibility is. We have multiple runways, and the visibility needs to be checked every hour.
Right now the visibility check is done manually with a human being who looks at the photos from the cameras placed at the end of each runway. So it can be tedious.
I'm a programmer who has very little experience with machine learning, but this sounds like an easy problem to automate. How should I approach this problem? Which algorithms should I study? Would OpenCV help me?
Thanks!
I think this can be automated using computer vision techniques. openCV could make the implementation easier. If all the signs are similar then ,we can train our program to recognize the sign in a specific conditions(lights). Then, we can use the trained classifier to check for the visibility of signs every hours using a simple script.
There is harr-like feature extraction already in openCV. You can use to train classifier which will output a .xml file and use that .xml file for detecting the sign regularly.
I have done a similar project RTVTR(Real Time Vehicle Tracking and Recognition) using openCV and it worked great. http://www.youtube.com/watch?v=xJwBT76VEZ4
Answering to your questions:
How should I approach this problem?
It depends on the result you want/need to obtain. Is this an "hobby" project (even if job-related) or do you need to build a machine vision system to solve the problem and should it be compliant with some regulations or standard?
Which algorithms should I study?
I am very interested in your question but I am not an expert in the field of meteorology and so searching in the relative literature is, for me, a time consuming task... so I reserve to update this part of the answer in the future. I think there will be different algorithms involved in the solution of the problem, some are very general like for example algorithms for the image segmentation, some are very specific like for example how to measure the visibility.
Update: one of the keyword for searching in the literature is Meteorological Visibility, for example
HAUTIERE, Nicolas, et al. Automatic fog detection and estimation of visibility distance through use of an onboard camera. Machine Vision and Applications, 2006, 17.1: 8-20.
LENOR, Stephan, et al. An Improved Model for Estimating the Meteorological Visibility from a Road Surface Luminance Curve. In: Pattern Recognition. Springer Berlin Heidelberg, 2013. p. 184-193.
Would OpenCV help me?
Yes, I think OpenCV can help giving you a starting point.
An idea for a naïve algorithm:
Segment the image in order to get the pixel regions belonging to the signs and to the background.
Compute the measure of visibility according to some procedure, the measure is computed by a function that has as input the regions of all the signs and the background region.
The segmentation can be simplified a lot if the signs are always in the same fixed and known position inside the image.
The measure of visibility is obviously the core of the algorithm and it can be performed in a lot of ways...
You can follow a simple approach where you compute the visibility with a mathematical formula based on the average gray level of the signs and background regions.
You can follow a more sophisticated and machine-learning oriented approach where you implement an algorithm that mimics your current human being based procedure. In this case your problem can be framed as a supervised learning task: you have a set of training examples, each training example is a pair composed by a) the photo of the runway (the input) and b) the visibility related to that photo and computed by human (the desired output). Then the system is trained by means of the training set and when you give a new photo as input it will give you back the visibility measure. I think you have a log for past visibility measures (METAR?) and if you saved the related images too, you will already have a relevant amount of data in order to build a training set and a test set.
Update in the age of Convolutional Neural Networks:
YOU, Yang, et al. Relative CNN-RNN: Learning Relative Atmospheric Visibility from Images. IEEE Transactions on Image Processing, 2018.
Both Tensor and uvts_cvs 's replies are very helpful. While the opencv mainly aims to recognize the sign pattern or even segment it from the background, when you extract the core feature in your problem : visibility, you may still need to include the background signal in your training set. I assume manual check of visibility is based on image contrast, if so, the signal-to-noise ratio(SNR) or contrast-to-noise ratio(CNR) is a good feature in learning. A threshold is defined to classify 'visible-1' and 'invisible-0'. The SNR/CNR can be obtained automatically especially if your sign position and size are fixed in your camera images.
Gather whole bunch of photos and videos and propose it as a challenge on Kaggle. I am sure many people would like to try solve it, even if reward would not be very high.
You can use the template matching functionality of openCV:
http://docs.opencv.org/doc/tutorials/imgproc/histograms/template_matching/template_matching.html
Where the template is the sign. If you manage to find a correct match, then the sign is visible. I think you can also get a sense of the scale of the sign in the image from that code.
As this is a very controlled and static environment, you have perfect conditions to estimate the visibility with vision-based approaches. Nonetheless, it is not so easy to decide which approach to take. In my thesis, I am reviewing this topic in depth for the less well-controlled environment of road traffic. See: LENOR, Stephan. Model-Based Estimation of Meteorological Visibility in the Context of Automotive Camera Systems. 2016. Doktorarbeit. (https://archiv.ub.uni-heidelberg.de/volltextserver/20855/1/20160509_lenor_thesis_final_print.pdf).
I see two major directions you could follow up:
Model-based approaches: Advantages: Not so much dependent on your very specific setup. You do not need heavy collection of data.
Data-based approaches/ML: Advantages: Can hide the whole complexity of different light and weather conditions. You seem to have a good source of data if there are people doing the job right now. Very promising without much engineering effort (just use a light-weighted CNN with few layers or so).
You could also combine both, etc. etc. If you are still interested in a solution, you can contact me again and I am happy to consult in more depth.

Is there a OCR API that can count objects?

Is there a OCR API that could be used for recognizing and counting objects from image? Or can this be done with another image processing image processing technique?
For example if i take a close-up photo of three boxes, API would just return number 3 as a result.
You can look into OpenCV, which is popular for programmers learning about image processing and vision. You'll find an endless number of posts here on StackOverflow about OpenCV.
http://opencv.org/
Some freeware GUIs and free starter versions of commercial image processing packages will allow you to test image processing techniques without having to write the code. ImageJ is old but still worth checking out:
http://rsbweb.nih.gov/ij/
I don't want to show favoritism towards any of my sisters and brothers in the image processing world, but if you google for "machine vision free" or "computer vision free" and add words such as "GUI" you should be able to quickly find some free software that will allow you to test different image processing techniques just by using your mouse.
Along with your OCR algorithm, you'll need a segmentation method to count objects.
One such technique is the connected components algorithm:
http://en.wikipedia.org/wiki/Connected-component_labeling
The typical connect components algorithm would rely on some preprocessing:
Find a binarization threshold.
Apply the binarization threshold to generate an image of black (0) and white (1) values.
Run the connect components algorithm and label all components (objects)
Filter the results by size and other parameters. For example, you probably don't want to include foreground objects that are only a few pixels in size.
Check the size of the list of filtered components.
This is a simple, low-level method, but it's useful in many situations. Even if you think you need a more complicated technique, I would strongly recommend that you first become familiar with connected components before moving on. Until one grasps the subtleties of lighting, binarization, and component labeling, it's unlikely one can learn much useful about more complicated algorithms. There really are no shortcuts.
There are other,more complicated methods, but before suggesting which might be appropriate you would have to be more specific about what kind of objects you want to find.
With any image processing question, always include one or more sample images. It's generally not useful to talk about image processing algorithms without first understanding the image set with which you are working. What may be obvious to you will not be obvious to others, especially those who have spent years working on OCR applications and who have had to deal with a wide variety of backgrounds, scripts, and specifications.

Image classification/recognition open source library

I have a set of reference images (200) and a set of photos of those images (tens of thousands). I have to classify each photo in a semi-automated way. Which algorithm and open source library would you advise me to use for this task? The best thing for me would be to have a similarity measure between the photo and the reference images, so that I would show to a human operator the images ordered from the most similar to the least one, to make her work easier.
To give a little more context, the reference images are branded packages, and the photos are of the same packages, but with all kinds of noises: reflections from the flash, low light, imperfect perspective, etc. The photos are already (manually) segmented: only the package is visible.
Back in my days with image recognition (like 15 years ago) I would have probably tried to train a neural network with the reference images, but I wonder if now there are better ways to do this.
I recommend that you use Python, and use the NumPy/SciPy libraries for your numerical work. Some helpful libraries for handling images are the Mahotas library and the scikits.image library.
In addition, you will want to use scikits.learn, which is a Python wrapper for Libsvm, a very standard SVM implementation.
The hard part is choosing your descriptor. The descriptor will be the feature you compute from each image, intended to compute a similarity distance with the set of reference images. A good set of things to try would be Histogram of Oriented Gradients, SIFT features, and color histograms, and play around with various ways of binning the different parts of the image and concatenating such descriptors together.
Next, set aside some of your data for training. For these data, you have to manually label them according to the true reference image they belong to. You can feed these labels into built-in functions in scikits.learn and it can train a multiclass SVM to recognize your images.
After that, you may want to look at MPI4Py, an implementation of MPI in Python, to take advantage of multiprocessors when doing the large descriptor computation and classification of the tens of thousands of remaining images.
The task you describe is very difficult and solving it with high accuracy could easily lead to a research-level publication in the field of computer vision. I hope I've given you some starting points: searching any of the above concepts on Google will hit on useful research papers and more details about how to use the various libraries.
The best thing for me would be to have a similarity measure between the photo and the reference images, so that I would show to a human operator the images ordered from the most similar to the least one, to make her work easier.
One way people do this is with the so-called "Earth mover's distance". Briefly, one imagines each pixel in an image as a stack of rocks with height corresponding to the pixel value and defines the distance between two images as the minimal amount of work needed to transfer one arrangement of rocks into the other.
Algorithms for this are a current research topic. Here's some matlab for one: http://www.cs.huji.ac.il/~ofirpele/FastEMD/code/ . Looks like they have a java version as well. Here's a link to the original paper and C code: http://ai.stanford.edu/~rubner/emd/default.htm
Try Radpiminer (one of the most widely used data-mining platform, http://rapid-i.com) with IMMI (Image Mining Extension, http://www.burgsys.com/mumi-image-mining-community.php), AGPL licence.
It currently implements several similarity measurement methods (not only trivial pixel by pixel comparison). The similarity measures can be input for a learning algorithm (e.g. neural network, KNN, SVM, ...) and it can be trained in order to give better performance. Some information bout the methods is given in this paper:
http://splab.cz/wp-content/uploads/2012/07/artery_detection.pdf
Now-a-days Deep Learning based framworks like Torch , Tensorflow, Theano, Keras are the best open source tool/library for object classification/recognition tasks.

Resources